我有一个包含客户评论的 R DataFrame 数据,审核员通过复制整个评论并在新行中插入每个原因代码来输入多个原因代码。这是我所拥有的:
Item Category Reason Review
Vacuum Performance Bad Suction I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum Design Cord is too short I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum Color Wrong Color I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat Size too big The boat was way too big, and was slow.
Boat Performance slow The boat was way too big, and was slow.
Tube Inflation low inflation The tube was not inflated enough
我希望按共享列(项目和评论)对其进行分组,并为多个原因和类别创建类别和原因列。让我们提前假设我不知道每个项目的独特原因和类别的数量,因为我正在向您展示虚拟数据。
所以,我想要的是这样的:
Item Category.1 Category.2 Category.3 Reason.1 Reason.2 Reason.3 Review
Vacuum Performance Design Color Bad Suction Cord is too short Wrong Color I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat Size Performance NA too big slow NA The boat was way too big, and was slow.
Tube Inflation NA NA low inflation NA NA The tube was not inflated enough
我尝试使用以下代码无济于事:
reshape(data, direction = "wide",
idvar = c("Item", "Review" ),
timevar = c("Category", "Reason"))
这是数据:
dput(Data)
structure(list(Item = c("Vacuum", "Vacuum", "Vacuum", "Boat",
"Boat", "Tube"), Category = c("Performance", "Design",
"Color", "Size", "Performance", "Inflation"
), Reason = c("Bad Suction", "Cord is too short", "Wrong Color",
"too big", "slow", "low inflation"), Review = c("I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.",
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.",
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.",
"The boat was way too big, and was slow.", "The boat was way too big, and was slow.",
"The tube was not inflated enough")), .Names = c("Item", "Category",
"Reason", "Review"), class = "data.frame", row.names = c(NA,
-6L))