我不太明白如何使用 dplyr 包在 R 中构建一些分组和摘要。
通过下面的可重现示例,我尝试首先按 (PN,GOT,HID) 分组以计算 PC1 的不同实例。然后,我根据第二个分组,按 (PN,GOT) 重新分组,对 PC1 的不同实例求和。这个过程似乎适用于总和,除了对于平均值(TC),当我希望通过(PN,GOT)的分组看到平均值时,我得到了整个数据帧的平均值。我缺少什么来获得(PN,GOT)的那些手段,同时又不失去我建立的 PC1 的总和?我会很感激我在这里出错的地方的一些解释。
PN<- c("Mazda","Mazda","Datsun","Hornet","Hornet","Valiant","Duster","Merc","Merc","Merc","Merc","Merc",
"Merc","Merc","Fiat","Honda","Toyota","Toyota","Dodge","AMC","Fiat")
GOT<- c("A","A","B","C","C","A","D","B","B","B","B","B","B","B","A","D","B","B","C","E","A")
HID<- c("Mazda_H1","Mazda_H1","Datsus_H1","Hornet_H1","Hornet_H2","Valiant_H1","Duster_H1","Merc_H1","Merc_H1","Merc_H1",
"Merc_H2","Merc_H2","Merc_H3","Merc_H4","Fiat_H1","Honda_H1","Toyota_H1","Toyota_H2","Dodge_H1","AMC_H1","Fiat_H1")
PIC<- c("BB","BB","BB","BB","AA","AA","AA","BA","BA","BA",
"AA","BB","BB","BB","BB","AA","AA","AA","BA","BA","BA")
TC <- c(110,110,93,175,175,105,245,62,62,62,62,62,62,62,33,52,97,97,150,150,33)
Int <- c(16.46,17.02,18.61,19.44,17.02,20.22,15.84,20.00,22.90,18.30,18.90,
17.40,17.60,18.00,19.47,18.52,19.90,20.01,16.87,17.30,18.90)
PC1<- c("", "","G1","C1","","G1","", "G1","G1","C1","C1","","","","Z1","Z1","Z1","Z1","","","G1")
df<-data.frame(PN,GOT,HID,PIC,TC,Int,PC1)
df
df%>% filter(PC1!="") %>%
group_by(PN, GOT, HID) %>%
summarize(new = n_distinct(PC1)) %>%
group_by(PN, GOT) %>%
mutate(TOT_new = sum(new),
meanTC = mean(TC))
我认为我正在寻找的答案是这样的:
PN GOT HID TOT_new meanTC
<fctr> <fctr> <fctr> <int> <dbl>
1 Datsun B Datsus_H1 1 93
2 Fiat A Fiat_H1 2 33
3 Honda D Honda_H1 1 52
4 Hornet C Hornet_H1 1 175
5 Merc B Merc_H1 3 62
6 Toyota B Toyota_H1 2 97
7 Valiant A Valiant_H1 1 105
或者至少是这样的:
PN GOT HID new TOT_new meanTC
<fctr> <fctr> <fctr> <int> <int> <dbl>
1 Datsun B Datsus_H1 1 1 93
2 Fiat A Fiat_H1 2 2 33
3 Honda D Honda_H1 1 1 52
4 Hornet C Hornet_H1 1 1 175
5 Merc B Merc_H1 2 3 62
6 Merc B Merc_H2 1 3 62
7 Toyota B Toyota_H1 1 2 97
8 Toyota B Toyota_H2 1 2 97
9 Valiant A Valiant_H1 1 1 105