1

使用 R ID=Custid 中的以下数据集

ID Geo Channel Brand Neworstream RevQ112 RevQ212 RevQ312
1  NA  On-line  1      New         5         0       1
1  NA  On-line  1      Stream      5         0       1
3  EU  Tele     2       Stream     5         1       0

我想将数据集转换为这种格式的列

ID Geo Brand Neworstream OnlineRevQ112 TeleRevQ112 OnlineRevQ212 TeleRevQ212

这样做的最佳方法是什么?无法找出 R 中的最佳命令。

提前致谢

4

2 回答 2

4

您可以使用reshape2包及其函数meltdcast重构数据。

data <- structure(list(ID = c(1L, 1L, 3L), Geo = structure(c(NA, NA, 
1L), .Label = "EU", class = "factor"), Channel = structure(c(1L, 
1L, 2L), .Label = c("On-line", "Tele"), class = "factor"), Brand = c(1L, 
1L, 2L), Neworstream = structure(c(1L, 2L, 2L), .Label = c("New", 
"Stream"), class = "factor"), RevQ112 = c(5L, 5L, 5L), RevQ212 = c(0L, 
0L, 1L), RevQ312 = c(1L, 1L, 0L)), .Names = c("ID", "Geo", "Channel", 
"Brand", "Neworstream", "RevQ112", "RevQ212", "RevQ312"), class = "data.frame", row.names = c(NA, 
-3L)) 

library(reshape2)
## melt data
df_long<-melt(data,id.vars=c("ID","Geo","Channel","Brand","Neworstream"))

## recast in combinations of channel and time frame
dcast(df_long,... ~Channel+variable,sum)
于 2013-08-09T17:51:14.017 回答
2

更新/facepalm

数据集中的“NA”可能不是NA值,而是北美的缩写“NA”或类似的东西。

如果您在读取数据时使用过,那么按照我最初的指示na.strings使用应该没有问题:reshape

mydf <- read.table(header = TRUE, na.strings = "", 
text = 'ID Geo Channel Brand Neworstream RevQ112 RevQ212 RevQ312
1  NA  On-line  1      New         5         0       1
1  NA  On-line  1      Stream      5         0       1
3  EU  Tele     2       Stream     5         1       0')

reshape(mydf, direction = "wide",
        idvar = c("ID", "Geo", "Brand", "Neworstream"),
        timevar = "Channel")

(但是,我可能会建议您更改缩写以提高可读性并减少混淆!)


原始答案(因为那里还有一些有趣的东西reshape

这应该这样做:

reshape(mydf, direction = "wide", 
        idvar = c("ID", "Geo", "Brand", "Neworstream"), 
        timevar = "Channel")
#   ID  Geo Brand Neworstream RevQ112.On-line RevQ212.On-line RevQ312.On-line
# 1  1 <NA>     1         New               5               0               1
# 3  3   EU     2      Stream              NA              NA              NA
#   RevQ112.Tele RevQ212.Tele RevQ312.Tele
# 1           NA           NA           NA
# 3            5            1            0

更新(尝试挽救答案)

正如@Arun 指出的那样,上述内容并不完全正确。这里的罪魁祸首是,当指定多个 ID 变量时interaction(),它用于创建一个新的临时 ID 变量。reshape()

以下是应用到我们的“mydf”对象时的行reshape()及其外观:

data[, tempidname] <- interaction(data[, idvar], drop = TRUE)
interaction(mydf[c(1, 2, 4, 5)], drop = TRUE)
# [1] <NA>          <NA>          3.EU.2.Stream
# Levels: 3.EU.2.Stream

嗯。这似乎简化为两个 ID,NA并且3.EU.2.Stream.

如果我们替换NA为会发生什么""

mydf$Geo <- as.character(mydf$Geo)
mydf$Geo[is.na(mydf$Geo)] <- ""
interaction(mydf[c(1, 2, 4, 5)], drop = TRUE)
# [1] 1..1.New      1..1.Stream   3.EU.2.Stream
# Levels: 1..1.New 1..1.Stream 3.EU.2.Stream

啊啊。这样好一点。我们现在有了三个唯一的 ID……而且reshape()似乎可以工作。

reshape(mydf, direction = "wide", 
        idvar=names(mydf)[c(1, 2, 4, 5)], 
        timevar="Channel")
#   ID Geo Brand Neworstream RevQ112.On-line RevQ212.On-line
# 1  1         1         New               5               0
# 2  1         1      Stream               5               0
# 3  3  EU     2      Stream              NA              NA
#   RevQ312.On-line RevQ112.Tele RevQ212.Tele RevQ312.Tele
# 1               1           NA           NA           NA
# 2               1           NA           NA           NA
# 3              NA            5            1            0
于 2013-08-09T17:39:30.863 回答