r - 过滤数据集以不按列显示顶行

Question

这可能是一个更容易的。（首选 tidyverse 解决方案）

两个问题 Q1. 为什么下面没有按最大 Sepal.Length 值给我前 4 行

library(tidyverse)
1. iris %>% top_n(Sepal.Length,4)

Q2我想做与top_n相反的slice_max。我想显示数据框中没有前 n 行的数据框

library(tidyverse)
#something like below
iris %>% filter(!top_n(Sepal.Length,4))

1. 的输出应该是 4 行，2. 的输出应该是 146 行（150-4 行由顶部 Sepal.Length 值不带关系）

score 4 · Accepted Answer

slice函数族取代了即将被弃用的函数top_n。指定要排序的列order_by和nslice_max

library(dplyr)
iris %>% 
      slice_max(order_by = Sepal.Length, n = 4)

默认情况下，它使用with_ties = TRUE. 如果我们需要删除关系，请将其指定为FALSE

对于第二种情况，setdiff（里面有data.frame方法dplyr）可以使用

iris %>% 
  slice_max(order_by = Sepal.Length, n = 4) %>% 
  setdiff(iris, .)

或者另一种选择是根据排名创建dense_rank排名filter

iris %>%
     filter(!dense_rank(-Sepal.Length) %in% 1:4)

如果我们只想删除 4 行，则使用row_number

iris %>%
    filter(!row_number(-Sepal.Length) %in% 1:4)

或与slice

iris %>% 
    slice(setdiff(row_number(-Sepal.Length), 1:4))

score 2 · Accepted Answer

您的第一种方法不起作用的主要原因是因为top_n. 考虑：

args(top_n)
#function (x, n, wt) 
#NULL

回想一下，%>%运算符将左侧作为右侧的第一个参数传递。因此，Septal.Length成为第二个，4成为第三个。

因此，您要用于过滤的列需要放在最后或专门定义：

iris %>%
   top_n(wt=Sepal.Length,4)
#  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1          7.7         3.8          6.7         2.2 virginica
#2          7.7         2.6          6.9         2.3 virginica
#3          7.7         2.8          6.7         2.0 virginica
#4          7.9         3.8          6.4         2.0 virginica
#5          7.7         3.0          6.1         2.3 virginica

#Altenative
iris %>%
  top_n(4,Sepal.Length)

第二个问题请参考@akrun 的回答。

r - 过滤数据集以不按列显示顶行

2 回答 2

Related

Reference