I am very new to data.table
and would like to try it out to see if it makes my analysis faster. I mainly use knitr
to compile .rnw
files (which I tend to compile many times per hour so I want it to be as fast as possible).
I have posted a sample below and this is by no means a question of comparison agianst data.table
and data.frame
. I would like to know if I my code below is what it should be.
I am basically joining two data.tables
and then need to linearly approximate using na.approx
missing NA
values. I used the Introduction to data.table vignette from CRAN and JOINing data in R using data.table from R-Pubs.
The code I am using below results in my best attempt at a data.table
method taking a long time (in general too, I only added the other code reference).
Also, if anyone knows if there is a way to pipe in na.approx()
into a chain and still have the output as a data.frame
that would be appreciated. Note the df_merged = as.data.frame(df_merged)
line that I would like to get rid of if possible!
Any input is greatly appreciated thank you!
library(data.table)
library(zoo)
library(dplyr)
dt_function_test = function() {
set.seed(123)
# data.table
dt_random = data.table(vals = runif(1E5, 0, 500))
dt_na = data.table(vals = c(0, 250, 500),
ref1 = c(0.33, 0.45, 0.78),
ref2 = c(0.12, 0.79, 1))
dt_merged = merge(dt_random[],
dt_na[],
all = TRUE)
dt_merged = dt_merged[, lapply(.SD,
na.approx),
by = vals]
}
df_function_test = function() {
set.seed(123)
# data.frame
df_random = data.frame(vals = runif(1E5, 0, 500))
df_na = data.frame(vals = c(0, 250, 500),
ref1 = c(0.33, 0.45, 0.78),
ref2 = c(0.12, 0.79, 1))
df_merged = full_join(df_random,
df_na) %>%
na.approx
df_merged = as.data.frame(df_merged)
}
print(system.time(dt_function_test()))
# user system elapsed
# 11.42 0.00 11.46
print(system.time(df_function_test()))
# Joining, by = "vals"
# user system elapsed
# 0.05 0.05 0.10