Assigning Columns as Factors in Data.Table

I love working with data.table, but there are a few options that always trip me up. Well, I guess there are a lot of things that do. The best way to avoid these pitfalls are to write functions that handle the tasks for you. You can setup a R script that loads when you need the functions. Selecting specific columns in data.table, and converting them to factors or numerical is a good example. I seem to require that operation often enough, but I just seem to stumble with it often - and then my googling expertise comes in. Eventually, you might get tired of spending that time on looking up the same things often.

Solution

  • Create a list of the columns (cols) you need to change - i.e., cols = c(“a”, “b”, “e”)
  • Use a dt[,(cols) := lapply(.SD, as.factor), .SDcols = cols] to encode the change
  • Write a R script that has functions to perform the task

Let’s Try It Out!

First, let’s create the function and data.table to work with.

did_recode_columns <- function(dt, cols, type = c("as.numeric", "as.factor", "as.character", "as.interger", "as.double") ) {
    # function used to convert data.table columns
    # to factor, numeric, or character
  library(data.table)
  dt[,(cols) := lapply(.SD, type), .SDcols = cols]

  }

dt <- data.table(a = sample(5), b = sample(5), c = sample(5), d = sample(5), e = sample(5))

pander(sapply(dt, class))
a b c d e
integer integer integer integer integer

Now let’s set the columns and use the function to change them.

cols = c("a", "b", "e")

did_recode_columns(dt, cols, type = "as.factor")

pander(sapply(dt, class))
a b c d e
factor factor integer integer factor

Until next time…