我正在尝试将一列数据格式化为许多二进制列,最终用于关联规则挖掘。我使用 for 循环和一个简单的三元组矩阵取得了一些成功,但我不确定如何按之后第一列中的级别进行聚合——类似于 SQL 中的group by语句。我在下面提供了一个示例,尽管数据集要小得多——如果成功,我的实际数据集将是 4,200 行 x 3,902 列,因此任何解决方案都需要可扩展。任何建议或替代方法将不胜感激!
> data <- data.frame(a=c('sally','george','andy','sue','sue','sally','george'), b=c('green','yellow','green','yellow','purple','brown','purple'))
> data
       a      b
1  sally  green
2 george yellow
3   andy  green
4    sue yellow
5    sue purple
6  sally  brown
7 george purple
x <- data[,1]
for(i in as.numeric(2:ncol(data))) 
 x <- cbind(x, simple_triplet_matrix(i=1:nrow(data), j=as.numeric(data[,i]),
              v = rep(1,nrow(data)), dimnames = list(NULL, levels(data[,i]))) )
##Looks like this:
> as.matrix(x)
     name    brown green purple yellow
[1,] "sally"  "0"    "1"   "0"     "0"    
[2,] "george" "0"    "0"   "0"     "1"   
[3,] "andy"   "0"    "1"   "0"     "0"    
[4,] "sue"    "0"    "0"   "0"     "1"   
[5,] "sue"    "0"    "0"   "1"     "0"    
[6,] "sally"  "1"    "0"   "0"     "0" ##Need to aggregate by Name
##Would like it to look like this:
     name    brown green purple yellow
[1,] "sally"  "1"   "1"   "0"    "0"    
[2,] "george" "0"   "0"   "0"    "1"   
[3,] "andy"   "0"   "1"   "0"    "0"    
[4,] "sue"    "0"   "0"   "1"    "1"