r - 过滤同一日期的数据，但仅保留高频值

Question

Date_Time              wind_cardinal_direction_set_1d weather_condition_set_1d     n
   <dttm>              <chr>                          <chr>                    <int>
 1 2015-01-01 01:00:00 N                              Fog                          1
 2 2015-01-01 01:00:00 N                              Mist                         2
 3 2015-01-01 02:00:00 N                              Fog                          2
 4 2015-01-01 02:00:00 N                              Mist                         1
 5 2015-01-01 03:00:00 N                              Fog                          3
 6 2015-01-01 04:00:00 N                              Mist                         3
 7 2015-01-01 05:00:00 N                              Mist                         3
 8 2015-01-01 06:00:00 N                              Mist                         3
 9 2015-01-01 07:00:00 N                              Fog                          2
10 2015-01-01 07:00:00 N                              Mist                         1
# ... with 6,798 more rows
>

对于每个日期时间组合，我想保留最大值为 n 的组合

df_cat %>%  filter(df_cat$n>df_cat$n,)

score 0 · Accepted Answer

欢迎来到 SO！看来你喜欢dplyr，所以这里有一个解决方案：

library(dplyr)
df_cat %>%   
  group_by(Date_Time) %>%    # group by date
  summarise(n = max(n)) %>%  # get the max values
  left_join(df_cat) %>%      # fetch the other columns
  # order them
  select(Date_Time,wind_cardinal_direction_set_1d,weather_condition_set_1d, n)
Joining, by = c("Date_Time", "n")
# A tibble: 7 x 4
  Date_Time           wind_cardinal_direction_set_1d weather_condition_set_1d     n
  <fct>               <fct>                          <fct>                    <int>
1 2015-01-01 01:00:00 N                              Mist                         2
2 2015-01-01 02:00:00 N                              Fog                          2
3 2015-01-01 03:00:00 N                              Fog                          3
4 2015-01-01 04:00:00 N                              Mist                         3
5 2015-01-01 05:00:00 N                              Mist                         3
6 2015-01-01 06:00:00 N                              Mist                         3
7 2015-01-01 07:00:00 N                              Fog                          2

或者，感谢 Ronak Shah，您可以通过这种方式实现：

df_cat %>% group_by(Date_Time) %>% slice(which.max(n))

有数据：

df_cat <- read.table(text ='Date_Time              wind_cardinal_direction_set_1d weather_condition_set_1d     n
 1 "2015-01-01 01:00:00" N                              Fog                          1
 2 "2015-01-01 01:00:00" N                              Mist                         2
 3 "2015-01-01 02:00:00" N                              Fog                          2
 4 "2015-01-01 02:00:00" N                              Mist                         1
 5 "2015-01-01 03:00:00" N                              Fog                          3
 6 "2015-01-01 04:00:00" N                              Mist                         3
 7 "2015-01-01 05:00:00" N                              Mist                         3
 8 "2015-01-01 06:00:00" N                              Mist                         3
 9 "2015-01-01 07:00:00" N                              Fog                          2
10 "2015-01-01 07:00:00" N                              Mist                         1', header = T)

r - 过滤同一日期的数据，但仅保留高频值

1 回答 1

Related

Reference