0

我有一个包含客户评论的 R DataFrame 数据,审核员通过复制整个评论并在新行中插入每个原因代码来输入多个原因代码。这是我所拥有的:

Item    Category        Reason                 Review  
Vacuum  Performance     Bad Suction            I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum  Design          Cord is too short      I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum  Color           Wrong Color            I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat    Size            too big                The boat was way too big, and was slow.
Boat    Performance     slow                   The boat was way too big, and was slow.
Tube    Inflation       low inflation          The tube was not inflated enough

我希望按共享列(项目和评论)对其进行分组,并为多个原因和类别创建类别和原因列。让我们提前假设我不知道每个项目的独特原因和类别的数量,因为我正在向您展示虚拟数据。

所以,我想要的是这样的:

Item    Category.1    Category.2   Category.3  Reason.1       Reason.2           Reason.3      Review  
Vacuum  Performance   Design       Color       Bad Suction    Cord is too short  Wrong Color   I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat    Size          Performance    NA        too big        slow               NA            The boat was way too big, and was slow.
Tube    Inflation     NA             NA        low inflation  NA                 NA            The tube was not inflated enough

我尝试使用以下代码无济于事:

reshape(data, direction = "wide", 
        idvar = c("Item", "Review" ), 
        timevar = c("Category", "Reason"))

这是数据:

dput(Data)
structure(list(Item = c("Vacuum", "Vacuum", "Vacuum", "Boat", 
"Boat", "Tube"), Category = c("Performance", "Design", 
"Color", "Size", "Performance", "Inflation"
), Reason = c("Bad Suction", "Cord is too short", "Wrong Color", 
"too big", "slow", "low inflation"), Review = c("I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"The boat was way too big, and was slow.", "The boat was way too big, and was slow.", 
"The tube was not inflated enough")), .Names = c("Item", "Category", 
"Reason", "Review"), class = "data.frame", row.names = c(NA, 
-6L))
4

1 回答 1

1

您只需要从“项目”列创建一个“时间”变量:

Data$UniqueReview <- ave(Data$Item, Data$Item, FUN = seq_along)
out <- reshape(Data, direction = "wide", idvar="Item", timevar="UniqueReview")
names(out)
#  [1] "Item"       "Category.1" "Reason.1"   "Review.1"   "Category.2" "Reason.2"  
#  [7] "Review.2"   "Category.3" "Reason.3"   "Review.3" 

这是生成的“宽”数据集的“类别”和“原因”列(正好适合屏幕)。

out[, grep("Item|Category|Reason", names(out))]
#     Item  Category.1      Reason.1  Category.2          Reason.2 Category.3    Reason.3
# 1 Vacuum Performance   Bad Suction      Design Cord is too short      Color Wrong Color
# 4   Boat        Size       too big Performance              slow       <NA>        <NA>
# 6   Tube   Inflation low inflation        <NA>              <NA>       <NA>        <NA>

此外,library(reshape)也不是指reshape您尝试使用的内置功能。相反,这是“reshape2”包的旧版本。


重新阅读您的问题和您的评论,因为您可以假设“评论”列可以被视为它自己的 ID 列,只需reshape相应地更改命令:

reshape(Data, direction = "wide", idvar=c("Item", "Review"), timevar="UniqueReview")
#     Item
# 1 Vacuum
# 4   Boat
# 6   Tube
#                                                                                        Review
# 1 I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
# 4                                                     The boat was way too big, and was slow.
# 6                                                            The tube was not inflated enough
#    Category.1      Reason.1  Category.2          Reason.2 Category.3    Reason.3
# 1 Performance   Bad Suction      Design Cord is too short      Color Wrong Color
# 4        Size       too big Performance              slow       <NA>        <NA>
# 6   Inflation low inflation        <NA>              <NA>       <NA>        <NA>
于 2013-10-23T02:20:29.640 回答