r - R：堆叠多打题数据

Question

假设我们在一项调查中有 2 个问题，一个是关于个人推荐一家公司的可能性（为简单起见，假设有 2 家公司）。

所以，对于这个问题，我有一个包含 2 列的 data.frame：

df.recommend <- data.frame(rep(1:5,20),rep(1:5,20))
colnames(df.recommend) <- c("Company1","Company2")

并且，假设我们有另一个问题，要求受访者在他们认为“适合”公司的属性旁边勾选一个框。

所以，对于这个问题，我有另一个包含 4 列的 data.frame：

df.attribute <- data.frame(rep(0:1,50),rep(1:0,50),rep(0:1,50),rep(1:0,50))

colnames(df.attribute) <- c(
"Attribute1.Company1", 
"Attribute2.Company1", 
"Attribute1.Company2", 
"Attribute2.Company2")

现在，我想做的是查看属性 1 和 2 如何与所有公司（独立于公司）推荐问题的可能性范围相关。例如，只是为了了解那些极有可能推荐和属性 1 的人之间存在哪些惯性。

因此，我首先将两个问题绑定在一起：

df <- cbind(df.recommend, df.attribute)

我的问题是试图弄清楚如何堆叠这些数据，使列看起来像：

df.stacked <- data.frame(c(df$Company1,df$Company2),
c(df$Attribute1.Company1,df$Attribute1.Company2), 
c(df$Attribute2.Company1,df$Attribute2.Company2))
colnames(df.stacked) <- c("Likelihood","Attribute1","Attribute2")

这个例子在很大程度上被简化了。在我的实际问题中，我有 34 个公司和 24 个属性。

你能想出一种方法来有效地堆叠它们，而不必输入所有的 c() 语句吗？

注意：可能性的列模式是 Co1,Co2,Co3,Co4...，属性的模式是 At1.Co1,At2.Co1,At3.Co1 ... At1.Co34,At2.Co34...

score 4 · Accepted Answer

对于这类问题，Hadley 的 reshape 包是完美的工具。我将它与一些 stringr 和 plyr 语句（也是 Hadley 编写的包）结合起来。

这是我认为用十几行代码就能完成的完整解决方案。

首先，创建一些数据

library(reshape2) # EDIT 1: reshape2 is faster
library(stringr)
library(plyr)

# Create data frame
# Important: note the addition of a respondent id column

df_comp <- data.frame(
        RespID = 1:10,
        Company1 = rep(1:5, 2),
        Company2 = rep(1:5, 2)
)

df_attr <- data.frame(
        RespID = 1:10,
        Attribute1.Company1 = rep(0:1,5),
        Attribute2.Company1 = rep(1:0,5),
        Attribute1.Company2 = rep(0:1,5),
        Attribute2.Company2 = rep(1:0,5)
)

现在开始数据操作：

# Use melt to convert data from wide to tall

melt_comp <- melt(df_comp, id.vars="RespID")
melt_comp <- rename(melt_comp, c(variable="comp", value="likelihood"))
melt_attr <- melt(df_attr, id.vars="RespID")

# Use str_split to split attribute variables into attribute and company
# "." period needs to be escaped

# EDIT 2:  reshape::colsplit is simpler than str_split
split <- colsplit(melt_attr$variable, "\\.", names=c("attr", "comp")) 
melt_attr <- data.frame(melt_attr, split)
melt_attr$variable <- NULL

# Use cast to convert from tall to somewhat tall

cast_attr <- cast(melt_attr, RespID + comp ~ attr, mean)


# Combine data frames using join() in package plyr

df <- join(melt_comp, cast_attr)
head(df)

和输出：

  RespID     comp likelihood Attribute1 Attribute2
1      1 Company1          1          0          1
2      2 Company1          2          1          0
3      3 Company1          3          0          1
4      4 Company1          4          1          0
5      5 Company1          5          0          1
6      6 Company1          1          1          0

score 1 · Accepted Answer

我很快就做好了。看起来不是最好的，并且使用了 for 循环，但这不应该是只有 24 个值的问题

df.recommend <- data.frame(rep(1:5,20),rep(1:5,20))
colnames(df.recommend) <- c("Co1","Co2")

df.attribute <- data.frame(rep(0:1,50),rep(1:0,50),rep(0:1,50),rep(1:0,50))

colnames(df.attribute) <- c(
"At1.Co1", 
"At2.Co1", 
"At1.Co2", 
"At2.Co2") 


df.stacked <- data.frame(
    likelihood <- unlist(df.recommend)
    )
str <- strsplit(names(df.attribute),split="\\.")
atts <- unique(sapply(str,function(x)x[1]))

for (i in 1:length(atts)) 
{
    df.stacked[,i+1] <- unlist(df.attribute[sapply(str,function(x)x[1]==atts[i])])
}

names(df.stacked) <- c("likelihood",paste("attribute",1:length(atts),sep=""))

编辑：它假设公司对每个属性的顺序相同

r - R：堆叠多打题数据

2 回答 2

Related

Reference