r - 如何按 R 中的区域对 DF 内的 ID 求和？

Question

我有一个名为 crash_TA 的崩溃统计数据框。datafame 如下所示，但规模更大，每行代表一次崩溃。

数据帧称为 crash_TA

TA_name	TA_code	致命计数	严重伤害计数	轻微伤害计数	ID
灰色的	061	2	0	1	1
布勒	062	1	1	1	2
灰色的	061	1	1	1	3
克鲁萨	063	0	1	1	4
克鲁萨	063	1	1	2	5
奥塔哥	064	1	1	0	6

我想通过创建一个名为“伤亡”的新列来按 TA_name 总结致命、严重和轻微。我还想总结一下 ID，它代表每个地区的撞车次数，因为该值与伤亡人数不同，因为并非所有撞车都有人员伤亡。这个新列将被称为崩溃

我的新数据框将如下所示：

TA_name	TA_code	致命计数	严重伤害计数	轻微伤害计数	伤亡	崩溃
灰色的	061	3	1	2	6	2
布勒	062	1	1	1	3	1
克鲁萨	063	1	2	3	6	2
奥塔哥	064	1	1	0	2	1

这是我到目前为止尝试过的代码

crashes_stats_TA <- crashes_TA %>% 
  group_by(TA_code, TA_name) %>%
  summarise(across(contains("count"), ~sum(., na.rm = T)),
            across(Population, ~mean(., na.rm = T),
            across(contains("perc"), ~mean(., na.rm = T), .names = "{.col}_mean"))) %>%
  mutate(casualties = round(fatal_count + serious_injury_count + minor_injury_count), 
         crashes = round(ID = sum(ID, na.rm = T)))

但是，当我这样做时，我得到了这个错误：

Error: Problem with `mutate()` column `Crashes`.
i `Crashes = round(ID = sum(ID, na.rm = T))`.
x object 'ID' not found

score 2 · Accepted Answer

我们可以这样做：

library(dplyr)

df %>% 
  group_by(TA_name, TA_code) %>%
  add_count(name="crashes") %>% 
  summarise(across(contains("count"), sum),
            causalities = sum(fatal_count, serious_injury_count, minor_injury_count),
            crashes= unique(crashes))

  TA_name TA_code fatal_count serious_injury_count minor_injury_count causalities crashes
  <chr>     <int>       <int>                <int>              <int>       <int>   <int>
1 Buller       62           1                    1                  1           3       1
2 Clutha       63           1                    2                  3           6       2
3 Grey         61           3                    1                  2           6       2
4 Otago        64           1                    1                  0           2       1

score 1 · Accepted Answer

使用base R

out <- aggregate(.~ TA_name + TA_code, df[setdiff(names(df), "ID")], sum)
out$casualties <- rowSums(out[, -(1:2)])

-输出

> out
  TA_name TA_code fatal_count serious_injury_count minor_injury_count casualties
1    Grey      61           3                    1                  2          6
2  Buller      62           1                    1                  1          3
3  Clutha      63           1                    2                  3          6
4   Otago      64           1                    1                  0          2

数据

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")

score 1 · Accepted Answer

你可以使用 -

library(dplyr)

df %>%
  group_by(TA_name, TA_code) %>%
  summarise(across(fatal_count:minor_injury_count, sum, na.rm = TRUE),
            crashes = n(), .groups = 'drop') %>%
  mutate(casualties = rowSums(select(., fatal_count:minor_injury_count)))

#  TA_name TA_code fatal_count serious_injury_count minor_injury_count crashes casualties
#  <chr>     <int>       <int>                <int>              <int>   <int>      <dbl>
#1 Buller       62           1                    1                  1       1          3
#2 Clutha       63           1                    2                  3       2          6
#3 Grey         61           3                    1                  2       2          6
#4 Otago        64           1                    1                  0       1          2

数据

如果您以可重现的格式提供数据，则更容易提供帮助

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")

r - 如何按 R 中的区域对 DF 内的 ID 求和？

3 回答 3

数据

Related

Reference