我有一个数据框,详细说明了从库中babynames
获得的超过 5 个字母和短于 5 个字母的名称的计数。install.packages("babynames")
library(babynames)
经过一些过滤和ifelse
with str_length(name)
,我创建了一个如下所示的数据框:
sum_greaterthan5.sum sum_lessthan5.sum total_n_names.total_names
2109449 1436852 3546301
我想收集数据,这样我就可以有一个变量/列是代表有多少婴儿被命名为长于 5 个字母的名字的数值,依此类推......
criteria count
sum_greaterthan5.sum 2109449
sum_lessthan5.sum 1436852
total_n_names.total_names 3546301
但是,该gather
函数没有正确读取我的列:
> df_5letters <- df %>%
+ gather(key=criteria, value = count, c('sum_greaterthan5.sum', 'sum_lessthan5.sum', 'total_n_names.sum') )
Error: Can't subset columns that don't exist.
x Column `sum_greaterthan5.sum` doesn't exist.
我尝试使用列索引,但出现与类型相关的错误。是否有其他功能可以用来代替gather
,或者我可以通过其他方式修改我的gather
功能?
下面是我运行到这一点的代码:
babynames_2017_length_5 <- babynames_2017 %>%
mutate(five_letters = ifelse(str_length(name)>5,1,0)) %>%
filter(five_letters == 1) %>%
summarise(sum = sum(n))
babynames_2017_less_5 <- babynames_2017 %>%
mutate(five_letters = ifelse(str_length(name)>5,1,0)) %>%
filter(five_letters == 0) %>%
summarise(sum = sum(n))
df <- tibble(
sum_greaterthan5 = babynames_2017_length_5,
sum_lessthan5 = babynames_2017_less_5,
total_n_names = total_n # total_n was a variable that I got from a previous dataframe that I did a sum aggregation on with:
# total_n <- babynames_startwvowels[1,1]
)