我有一个tbl_df
包含多个值的列。我希望使用列中的值来创建几列。在那之后,我正在总结这些专栏。
我可以解决的一种方法是ifelse
在 a中创建多个,mutate
但这似乎效率低下。有没有更好的方法来解决这个问题?我在想可能有一个dplyr
和/或tidyr
基于的解决方案。
我想要做的例子如下。这只是数据和列的样本。它不包含我要创建的所有列。汇总表将包含一些基于列 sum
。mean
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- tibble::tribble(
~type, ~bb_type, ~description,
"B", NA, "ball",
"S", NA, "foul",
"X", "line_drive", "hit_into_play_no_out",
"S", NA, "swinging_strike",
"S", NA, "foul",
"X", "ground_ball", "hit_into_play",
"S", NA, "swinging_strike",
"X", "fly_ball", "hit_into_play_score",
"B", NA, "ball",
"S", NA, "foul"
)
df <- df %>%
mutate(ground_ball = ifelse(bb_type == "ground_ball", 1, 0),
fly_ball = if_else(bb_type == "fly_ball", 1, 0),
X = if_else(type == "X", 1, 0),
# not sure if this is the based way to go about counting columns that start with swinging to sum later
swinging_strike = grepl("^swinging", description))
df
#> # A tibble: 10 x 7
#> type bb_type description ground_ball fly_ball X swinging_strike
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl>
#> 1 B <NA> ball NA NA 0 FALSE
#> 2 S <NA> foul NA NA 0 FALSE
#> 3 X line_drive hit_into_play_no… 0 0 1 FALSE
#> 4 S <NA> swinging_strike NA NA 0 TRUE
#> 5 S <NA> foul NA NA 0 FALSE
#> 6 X ground_ba… hit_into_play 1 0 1 FALSE
#> 7 S <NA> swinging_strike NA NA 0 TRUE
#> 8 X fly_ball hit_into_play_sc… 0 1 1 FALSE
#> 9 B <NA> ball NA NA 0 FALSE
#> 10 S <NA> foul NA NA 0 FALSE
summary_df <- df %>%
summarize(n = n(),
fly_ball = sum(fly_ball, na.rm = TRUE),
ground_ball = sum(ground_ball, na.rm = TRUE))
summary_df
#> # A tibble: 1 x 3
#> n fly_ball ground_ball
#> <int> <dbl> <dbl>
#> 1 10 1 1
总之,我希望执行以下操作:
- 为其中的所有值创建新列
bb_type
并type
计算它们 - 创建一个新列,计算描述列中以摆动开头的值的数量。我希望看到一个示例,该示例从该列中选择另一个文本字符串并创建一个带有计数的新列作为附加示例。前任。球
- 在做我希望在 1 和 2 中实现的目标时,我将如何选择自己的名字?我必须在
dplyr::rename
事后简单地使用吗?