r - 在 R 中合并数据框的优雅方法？

Question

我想获取数据框的唯一行，然后将其与另一行属性连接。然后我希望能够计算出品种的数量，例如特定类型或来源的独特水果的数量。

第一个数据框有我的水果清单：

fruits <- read.table(header=TRUE, text="shop    fruit
                    1   apple
                    2   orange
                    3   apple
                    4   pear
                    2   banana
                    1   banana
                    1   orange
                    3   banana")

第二个数据框有我的属性：

fruit_class <- read.table(header=TRUE, text="fruit  type    origin
apple   pome    asia
                      banana  berry   asia
                      orange  citrus  asia
                      pear    pome    newguinea")

这是我对这个问题的笨拙解决方案：

fruit <- as.data.frame(unique(fruit[,2])) #get a list of unique fruits
colnames(fruit)[1] <- "fruit" #this won't rename the column and I don't know why...
fruit_summary <- join(fruits, fruit_class, by="fruit" #create a data frame that I can query
count(fruit_summary, "origin") #for eg, summarise the number of fruits of each origin

所以我的主要问题是：如何更优雅地表达这一点（即单行而不是 3 行）？其次：为什么它不允许我重命名列？

提前致谢

score 0 · Accepted Answer

简单地做

table(fruit_class$fruit, fruit_class$origin)

给你

       asia newguinea
apple     1         0
banana    1         0
orange    1         0
pear      0         1

您可以使用将地区编号相加colSums()。我想不出fruits需要数据框的原因，因为如果这里有一个不在其中的水果，fruit_class那么无论如何都没有它的原始数据。

顺便说一句，在您的代码示例中，colnames(fruit)[1] <- "fruit"应该可以工作，但只是colnames(fruit) <- "fruit"需要，因为 colnames 无论如何只有 1 个元素长。

score 0 · Accepted Answer

这是一个data.table解决方案。

library(data.table)
setDT(fruit_class)[, uniqueN(fruit), by=type]
#      type V1
# 1:   pome  2
# 2:  berry  1
# 3: citrus  1

setDT(fruit_class)[, uniqueN(fruit), by=origin]
#       origin V1
# 1:      asia  3
# 2: newguinea  1

r - 在 R 中合并数据框的优雅方法？

2 回答 2

Related

Reference