2

假设,我有一个如下
df1 的数据框:

+------+--+------+--------+
| ID   |  | Type | Points |
+------+--+------+--------+
| DJ45 |  | A    | 69.2 |
| DJ45 |  | F    | 60.8 |
| DJ45 |  | C    |  2.9 |
| DJ46 |  | B    | 22.7 |
| DJ46 |  | D    | 18.7 |
| DJ46 |  | A    | 16.1 |
| DJ47 |  | E    | 67.2 |
| DJ47 |  | C    | 63.1 |
| DJ47 |  | F    | 16.7 |
| DJ48 |  | D    |  8.4 |
+------+--+------+------+

我想获得一个结果,它将以以下格式提供类型的 Top 2 值(逐点):

输出:

+------+---------+---------+
| ID   | Type1   | Type2   |
+------+---------+---------+
| DJ45 |   A     | F       | 
| DJ46 |   B     | D       | 
| DJ47 |   E     | C       | 
| DJ48 |   D     | NA      | 

我用过:

df1 %>%
  group_by(Id) %>%
  top_n(2,wt=Points) %>%
  mutate(val = paste("Type", row_number())) %>% 
  filter(row_number()<=2) %>% 
  select(-Points) %>% 
  spread(val, Type)

但我得到以下答案:

输出:

+------+------+--------+---------+
| ID   |Points|Type1   | Type2   |
+------+------+--------+---------+
| DJ45 | 69.2 |  A     | NA      | 
| DJ45 | 60.8 |  NA    | F       | 
| DJ46 | 22.7 |  B     | NA      | 
| DJ46 | 18.7 |  NA    | D       | 
| DJ47 | 67.2 |  E     | NA      | 
| DJ47 | 63.1 |  NA    | C       |
| DJ48 |  8.4 |  D     | NA      |
4

1 回答 1

2
df <- read.table(header = T, stringsAsFactors = F, text = "
ID Type Points
DJ45 A 69.2
DJ45 F 60.8
DJ45 C 2.9
DJ46 B 22.7
DJ46 D 18.7
DJ46 A 16.1
DJ47 E 67.2
DJ47 C 63.1
DJ47 F 16.7
DJ48 D 8.4
")

library(dplyr)
library(tidyr)

df %>%
  group_by(ID) %>%
  top_n(2, wt = Points) %>%
  arrange(-Points) %>% 
  mutate(Points = paste0('Type', row_number())) %>% 
  spread(Points, Type)
  • top_n(2, wt = Points)Points根据 过滤ID 组中的前两行
  • arrange(-Points)按降序排列它们
  • mutate(Points = paste0('Type', row_number()))修改Points为等于 'Type' + ID 组中的行号(1 到 2)
  • spread(Points, Type)为每个唯一值创建列并在其中Points放置适当的值Type
于 2017-05-16T07:50:45.800 回答