我正在尝试为我相对较大的数据集实现一个 data.table,但我不知道如何在同一行中的多个列上操作一个函数。具体来说,我想创建一个新列,其中包含列子集中的特定格式的值(即直方图)。它有点像 table() 但它也包括 0 个条目并已排序 - 所以,如果你知道更好/更快的方法,我也会很感激!
简化的测试用例:
DF<-data.frame("A"=c("a","d","a"),"B"=c("b","a","a"),"C"=c("c","a","a"),"D"=c("a","b","c"),"E"=c("a","a","c"))
DT<-as.data.table(DF)
> DT
A B C D E
1: a b c a a
2: d a a b a
3: a a a c c
我笨拙的直方图函数:
histo<-function(vec){
foo<-c("a"=0,"b"=0,"c"=0,"d"=0)
for(i in vec){foo[i]=foo[i]+1}
return(foo)}
>histo(unname(unlist(DF[1,])))
a b c d
3 1 1 0
>histo(unname(unlist(DF[2,])))
a b c d
3 1 0 1
>histo(unname(unlist(DF[3,])))
a b c d
3 0 2 0
所需功能和输出的伪代码
>DT[,his:=some_func_with_histo(A:E)]
>DT
A B C D E his
1: a b c a a (3,1,1,0)
2: d a a b a (3,1,0,1)
3: a a a c c (3,0,2,0)