1

我有一个数据框如下:

Year   Category1   Category2   Value
1990   A           X           5
1990   B           X           4
1990   A           Y           3
1990   B           Y           1
1990   A           Z           4
1990   B           Z           2
1991   A           X           3
1991   B           X           2
1991   A           Y           8
...

我想将 Category2 中的观察 X 和 Y 结合起来,通过将 Value 列相加为一个新的观察,同时保持 year 和 Category2 组:

Year   Category1   Category2   Value
1990   A           X+Y         8
1990   A           Z           4
1990   B           Z           2
1990   B           X+Y         5
1991   A           X+Y         11
...
4

4 回答 4

3

假设这些是 中唯一的唯一值Category2

df <-data.frame(
  Year = c(rep(1990, 6), rep(1991,3)),
  Category1 = c("A","B", "A", "B", "A","B","A","B","A"),
  Category2 = c("X","X","Y","Y","Z","Z","X","X","Y"),
  Value = c(5,4,3,1,4,2,3,2,8)
)


df %>% 
  mutate(Category2 = ifelse(Category2 == "Z", "Z", "X+Y")) %>% 
  group_by(Year, Category1, Category2) %>% 
  summarise(Value = sum(Value))

# A tibble: 6 x 4
# Groups:   Year, Category1 [4]
   Year Category1 Category2 Value
  <dbl> <fct>     <chr>     <dbl>
1  1990 A         X+Y           8
2  1990 A         Z             4
3  1990 B         X+Y           5
4  1990 B         Z             2
5  1991 A         X+Y          11
6  1991 B         X+Y           2
于 2019-09-26T13:27:38.027 回答
1

Year您可以通过,Category1和一个临时逻辑变量(如果Category2等于X或)进行汇总Y。之后需要进行一些清理,但会得到您需要的结果。

library(dplyr)

df %>%
  group_by(Year, Category1, temp = Category2 %in% c("X", "Y")) %>%
  summarise(Category2 = paste(Category2, collapse = "+"),
            Value = sum(Value)) %>%
  select(-temp) %>%
  filter(!Category2 %in% c("X", "Y"))

# A tibble: 5 x 4
# Groups:   Year, Category1 [3]
   Year Category1 Category2 Value
  <int> <fct>     <chr>     <int>
1  1990 A         Z             4
2  1990 A         X+Y           8
3  1990 B         Z             2
4  1990 B         X+Y           5
5  1991 A         X+Y          11
于 2019-09-26T13:25:28.957 回答
1

或者,这也应该有效:

library(dplyr)
df %>%
  spread(Category2,Value, fill = 0) %>%
  mutate("X+Y" = X+Y) %>%
  select(-X,-Y) %>%
  gather(Category2,Value,-Year,-Category1) %>%
  group_by(Year,Category1,Category2) %>%
  summarise(Value = sum(Value, na.rm = TRUE))
于 2019-09-26T13:32:31.123 回答
1

我将搭载 mfidino。如果除了 X、Y 和 Z 之外还有其他值,您可以使用

Category2 %in% c('X', 'Y')

它如下所示:

df <- tribble(
  ~Year,   ~Category1,   ~Category2,   ~Value,
  1990,   'A',           'X',           5,
  1990,   'B',           'X',           4,
  1990,   'A',           'Y',           3,
  1990,   'B',           'Y',           1,
  1990,   'A',           'Z',           4,
  1990,   'B',           'Z',           2,
  1991,   'A',           'X',           3,
  1991,   'B',           'X',           2,
  1991,   'A',           'Y',           8
)

df %>% 
  mutate(
    Category2 = if_else(Category2 %in% c('X', 'Y'), 'X+Y', Category2)
  ) %>% 
  group_by(Year, Category1, Category2) %>% 
  summarise(
    Value = sum(Value)
  )
# A tibble: 6 x 4
# Groups:   Year, Category1 [4]
   Year Category1 Category2 Value
  <dbl> <chr>     <chr>     <dbl>
1  1990 A         X+Y           8
2  1990 A         Z             4
3  1990 B         X+Y           5
4  1990 B         Z             2
5  1991 A         X+Y          11
6  1991 B         X+Y           2
于 2019-09-26T13:30:36.357 回答