I use fread to import very big .CSV-files. Some columns have whitespace after the text that I need to remove. This takes too much time (hours).
The following code works but the command at system.time is very slow (about 12 seconds on my computer, and the real files are much bigger).
library(data.table)
library(stringr)
# Create example-data
df.1 <- rbind(c("Text1 ", 1, 2), c("Text2 ", 3, 4), c("Text99 ", 5, 6))
colnames(df.1) <- c("Tx", "Nr1", "Nr2")
dt.1 <- data.table(df.1)
for (i in 1:15) {
dt.1 <- rbind(dt.1, dt.1)
}
# Trim the "Tx"-column
dt.1[, rowid := 1:nrow(dt.1)]
setkey(dt.1, rowid)
system.time( dt.1[, Tx2 :={ str_trim(Tx) }, by=rowid] )
dt.1[, rowid:=NULL]
dt.1[, Tx:=NULL]
setnames(dt.1, "Tx2", "Tx")
Is there a faster way to trim whitespace in data.tables?