0

我有一个数据框,详细说明了从库中babynames获得的超过 5 个字母和短于 5 个字母的名称的计数。install.packages("babynames")library(babynames)

经过一些过滤和ifelsewith str_length(name),我创建了一个如下所示的数据框:

sum_greaterthan5.sum     sum_lessthan5.sum     total_n_names.total_names
2109449                  1436852               3546301

我想收集数据,这样我就可以有一个变量/列是代表有多少婴儿被命名为长于 5 个字母的名字的数值,依此类推......

criteria                    count
sum_greaterthan5.sum        2109449                  
sum_lessthan5.sum           1436852               
total_n_names.total_names   3546301

但是,该gather函数没有正确读取我的列:

> df_5letters <- df %>%
+   gather(key=criteria, value = count, c('sum_greaterthan5.sum', 'sum_lessthan5.sum', 'total_n_names.sum') )

Error: Can't subset columns that don't exist.
x Column `sum_greaterthan5.sum` doesn't exist.

我尝试使用列索引,但出现与类型相关的错误。是否有其他功能可以用来代替gather,或者我可以通过其他方式修改我的gather功能?

下面是我运行到这一点的代码:

babynames_2017_length_5 <- babynames_2017 %>%
  mutate(five_letters = ifelse(str_length(name)>5,1,0)) %>%
  filter(five_letters == 1) %>%
  summarise(sum = sum(n))

babynames_2017_less_5 <- babynames_2017 %>%
  mutate(five_letters = ifelse(str_length(name)>5,1,0)) %>%
  filter(five_letters == 0) %>%
  summarise(sum = sum(n))

df <- tibble(
  sum_greaterthan5 = babynames_2017_length_5,
  sum_lessthan5 = babynames_2017_less_5,
  total_n_names = total_n # total_n was a variable that I got from a previous dataframe that I did a sum aggregation on with:
# total_n <- babynames_startwvowels[1,1]
)
4

1 回答 1

1

babynames_2017_length_5并且babynames_2017_less_5是一个数据框,当您在其中使用它时,tibble(..)您正在制作一个嵌套数据框,因此gather无法找到该列。

从数据框中提取列,它应该可以正常工作。

df <- tibble(
  sum_greaterthan5 = babynames_2017_length_5$sum,
  sum_lessthan5 = babynames_2017_less_5$sum,
  total_n_names = total_n 
)

此外,不要创建两个单独的数据框,将它们组合起来然后使用gather/pivot_longer你可以这样做:

babynames %>%
  group_by(five_letters = ifelse(str_length(name)>5, 
                          'sum_greaterthan5', 'sum_lessthan5')) %>%
  summarise(sum = sum(n))
于 2020-12-17T04:06:41.053 回答