0

所以我有一个我正在尝试操作的数据集,但我似乎找不到正确的方法来做到这一点。Iv 研究过使用 dcast 和 spread,但不确定如何进行正确的操作。

所以我有类似的东西:

ID var1 var2 var3 category
--------------------------
1  x    x    x     a
1  x    x    x     b
1  x    x    x     b
2  y    y    y     a
2  y    y    y     b
2  y    y    y     c
3  z    z    z     b 
3  z    z    z     b
3  z    z    z     c

我希望它看起来像这样:

ID var1 var2 var3  a  b  c 
--------------------------------
1  x    x    x     1  1  0 
2  y    y    y     1  1  1
3  z    z    z     0  1  1  

简单的示例数据

ID <- c(1,1,1,2,2,2,3,3,3)
var1 <- c('x','x','x','y','y','y','z','z','z')
var2 <- c('x','x','x','y','y','y','z','z','z')
var3 <- c('x','x','x','y','y','y','z','z','z')
category <- c('a','b','b','a','b','c','b','b','c')

dat <- data.frame(ID,var1,var2,var3,category)
4

2 回答 2

1
ID <- c(1,1,1,2,2,2,3,3,3)
var1 <- c("x","x","x","y","y","y","z","z","z")
var2 <- c("x","x","x","y","y","y","z","z","z")
var3 <- c("x","x","x","y","y","y","z","z","z")
category <- c("a","b","b","a","b","c","b","b","c")

dat <- data.frame(ID,var1,var2,var3,category)

library(tidyr)
library(dplyr)

dat %>%
  distinct() %>%                   # get distinct rows
  mutate(value = 1) %>%            # create a counter
  spread(category, value, fill=0)  # reshape dataset

#   ID var1 var2 var3 a b c
# 1  1    x    x    x 1 1 0
# 2  2    y    y    y 1 1 1
# 3  3    z    z    z 0 1 1
于 2017-11-16T18:22:54.870 回答
0

由于问题被标记为dcast,我觉得有义务使用dcast().

OP 没有解释如何计算宽格式的列。从预期结果看来,OP 似乎对计算出现次数感兴趣,而是指示每个唯一组合的存在或不存在(1/0代替TRUE/ FALSE)。

因此,重塑操作中仅包含唯一行。length()仍然用作聚合函数,因为它根据请求用 0 填充空单元格。

library(reshape2)
dcast(unique(dat), ... ~ category, length)
  ID var1 var2 var3 a b c
1  1    x    x    x 1 1 0
2  2    y    y    y 1 1 1
3  3    z    z    z 0 1 1
于 2017-12-04T11:49:42.297 回答