r - 数据框中的行列数

Question

假设我有以下dataframe

country_df <- tibble(
  population = c(328, 38, 30, 56, 1393, 126, 57),
  population2 = c(133, 12, 99, 83, 1033, 101, 33),
  population3 = c(89, 39, 33, 56, 193, 126, 58),
  pop = 45
)

我所需要的只是mutate函数内部的一种简洁方法，以获取大于每行中 pop 列的值的列数（人口到人口 3）。

所以我需要的是以下结果（更具体地说是 GreaterTotal 列）注意：我可以通过处理每一列来得到答案，但如果有更多列则需要一段时间）

  population population2 population3   pop GreaterThan0 GreaterThan1 GreaterThan2 GreaterTotal
       <dbl>       <dbl>       <dbl> <dbl> <lgl>        <lgl>        <lgl>               <int>
1        328         133          89    45 TRUE         TRUE         TRUE                    3
2         38          12          39    45 FALSE        FALSE        FALSE                   0
3         30          99          33    45 FALSE        TRUE         FALSE                   1
4         56          83          56    45 TRUE         TRUE         TRUE                    3
5       1393        1033         193    45 TRUE         TRUE         TRUE                    3
6        126         101         126    45 TRUE         TRUE         TRUE                    3
7         57          33          58    45 TRUE         FALSE        TRUE                    2

我已经尝试使用apply行索引，但我无法做到。有人可以指出我正确的方向吗？

score 2 · Accepted Answer

您可以选择“人口”列并将这些列与这些列进行比较，pop并用于rowSums计算每行中有多少列更大。

cols <- grep('population', names(country_df))
country_df$GreaterTotal <- rowSums(country_df[cols] > country_df$pop)

#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2

在dplyr1.0.0 中，您可以使用rowwiseand执行此操作c_across：

country_df %>%
  rowwise() %>%
  mutate(GreaterTotal = sum(c_across(population:population3) > pop))

score 1 · Accepted Answer

使用tidyverse，我们可以做到

library(dplyr)
country_df %>%
      mutate(GreaterTotal = rowSums(select(., 
              starts_with('population')) > .$pop) )

-输出

# A tibble: 7 x 5
#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2

r - 数据框中的行列数

2 回答 2

Related

Reference