1

我有一个名为 crash_TA 的崩溃统计数据框。datafame 如下所示,但规模更大,每行代表一次崩溃。

数据帧称为 crash_TA

TA_name TA_code 致命计数 严重伤害计数 轻微伤害计数 ID
灰色的 061 2 0 1 1
布勒 062 1 1 1 2
灰色的 061 1 1 1 3
克鲁萨 063 0 1 1 4
克鲁萨 063 1 1 2 5
奥塔哥 064 1 1 0 6

我想通过创建一个名为“伤亡”的新列来按 TA_name 总结致命、严重和轻微。我还想总结一下 ID,它代表每个地区的撞车次数,因为该值与伤亡人数不同,因为并非所有撞车都有人员伤亡。这个新列将被称为崩溃

我的新数据框将如下所示:

TA_name TA_code 致命计数 严重伤害计数 轻微伤害计数 伤亡 崩溃
灰色的 061 3 1 2 6 2
布勒 062 1 1 1 3 1
克鲁萨 063 1 2 3 6 2
奥塔哥 064 1 1 0 2 1

这是我到目前为止尝试过的代码

crashes_stats_TA <- crashes_TA %>% 
  group_by(TA_code, TA_name) %>%
  summarise(across(contains("count"), ~sum(., na.rm = T)),
            across(Population, ~mean(., na.rm = T),
            across(contains("perc"), ~mean(., na.rm = T), .names = "{.col}_mean"))) %>%
  mutate(casualties = round(fatal_count + serious_injury_count + minor_injury_count), 
         crashes = round(ID = sum(ID, na.rm = T)))

但是,当我这样做时,我得到了这个错误:

Error: Problem with `mutate()` column `Crashes`.
i `Crashes = round(ID = sum(ID, na.rm = T))`.
x object 'ID' not found

数据框

4

3 回答 3

2

我们可以这样做:

library(dplyr)

df %>% 
  group_by(TA_name, TA_code) %>%
  add_count(name="crashes") %>% 
  summarise(across(contains("count"), sum),
            causalities = sum(fatal_count, serious_injury_count, minor_injury_count),
            crashes= unique(crashes))
  TA_name TA_code fatal_count serious_injury_count minor_injury_count causalities crashes
  <chr>     <int>       <int>                <int>              <int>       <int>   <int>
1 Buller       62           1                    1                  1           3       1
2 Clutha       63           1                    2                  3           6       2
3 Grey         61           3                    1                  2           6       2
4 Otago        64           1                    1                  0           2       1
于 2021-10-10T08:31:47.603 回答
1

使用base R

out <- aggregate(.~ TA_name + TA_code, df[setdiff(names(df), "ID")], sum)
out$casualties <- rowSums(out[, -(1:2)])

-输出

> out
  TA_name TA_code fatal_count serious_injury_count minor_injury_count casualties
1    Grey      61           3                    1                  2          6
2  Buller      62           1                    1                  1          3
3  Clutha      63           1                    2                  3          6
4   Otago      64           1                    1                  0          2

数据

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")
于 2021-10-10T18:15:00.847 回答
1

你可以使用 -

library(dplyr)

df %>%
  group_by(TA_name, TA_code) %>%
  summarise(across(fatal_count:minor_injury_count, sum, na.rm = TRUE),
            crashes = n(), .groups = 'drop') %>%
  mutate(casualties = rowSums(select(., fatal_count:minor_injury_count)))

#  TA_name TA_code fatal_count serious_injury_count minor_injury_count crashes casualties
#  <chr>     <int>       <int>                <int>              <int>   <int>      <dbl>
#1 Buller       62           1                    1                  1       1          3
#2 Clutha       63           1                    2                  3       2          6
#3 Grey         61           3                    1                  2       2          6
#4 Otago        64           1                    1                  0       1          2

数据

如果您以可重现的格式提供数据,则更容易提供帮助

df <- structure(list(TA_name = c("Grey", "Buller", "Grey", "Clutha", 
"Clutha", "Otago"), TA_code = c(61L, 62L, 61L, 63L, 63L, 64L), 
    fatal_count = c(2L, 1L, 1L, 0L, 1L, 1L), serious_injury_count = c(0L, 
    1L, 1L, 1L, 1L, 1L), minor_injury_count = c(1L, 1L, 1L, 1L, 
    2L, 0L), ID = 1:6), row.names = c(NA, -6L), class = "data.frame")
于 2021-10-10T08:08:05.180 回答