我想通过对每一列应用统计函数列表来总结一个非常大的数据表。我想像data.table
以前的版本一样使用plyr
,但速度很慢,我读到这应该快得多。我尝试了以下但我得到了
Error in { :
task 1 failed - "task 1 failed - "second argument must be a list""
这是我尝试过的功能
library(data.table)
library(e1071)
library(nortest)
statistical_tests = list(mean, sd, kurtosis, skewness,
lillie.test, shapiro.test)
summary = function(column) {
result = mapply(do.call, statistical_tests, column)
print(result)
return(result)
}
analyse_fits = function(fit_df) {
#get mean and standard deviation for the three parameters
print(fit_df)
setkey(fit_df, type)
return(fit_df[, lapply(.SD, summary),
by=type])
}
analyse_fits(fit_df)
示例数据fit_df
:
constant phase visibility type
1: 49927.22 -2.609797e-03 0.8690605 fft
2: 49965.89 -6.783609e-05 0.8702492 fft
3: 50026.44 -1.109387e-03 0.8680235 fft
4: 50063.78 2.640915e-04 0.8697564 fft
5: 50074.89 9.999202e-04 0.8684974 fft
6: 49964.89 -2.075373e-03 0.8708830 fft
7: 50063.56 -9.737554e-04 0.8721360 fft
8: 50044.11 -1.920089e-03 0.8722035 fft
9: 50100.67 -7.487811e-04 0.8706438 fft
10: 49962.11 4.163415e-03 0.8713016 fft
11: 49926.63 -1.473941e-03 0.8687753 ls
12: 49964.98 1.794244e-03 0.8710003 ls
13: 50025.89 -1.315459e-03 0.8698475 ls
14: 50063.40 2.891339e-04 0.8699723 ls
15: 50074.70 1.859353e-03 0.8684841 ls
16: 49964.43 -6.426037e-04 0.8706581 ls
17: 50063.47 -1.646874e-03 0.8715316 ls
18: 50043.48 -1.435637e-03 0.8713584 ls
19: 50100.36 -2.261318e-03 0.8699203 ls
20: 49961.76 3.659428e-03 0.8704063 ls
我确信有一种很好的方法来格式化输出以使其正常工作,你能帮我吗?