4

我有一个数据框,我想按一列而不是下一列进行排序(如果可能,使用 tidyverse)。

我检查了以下地址,但解决方案似乎不起作用。

订购“混合”向量(带字母的数字)

示例代码:

variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))

这并没有给我想要的东西:

df <- df %>% arrange(variable, level)

级别列的顺序如下:

variable level channel DIR channel EA channel IA level 1000 level 500 level 750

我需要他们:

variable level channel DIR channel EA channel IA level 500 level 750 level 1000

真实数据集中有多个不同的“变量”,其中一半需要按数字顺序排序,一半需要按字母顺序排序。有谁知道如何做到这一点?

4

6 回答 6

3

最简单的解决方案是使用dplyr::group_by.

library(dplyr)

variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))

df %>%
  group_by(variable, level) %>%
  arrange()

# A tibble: 6 x 2
  variable  level
     <chr> <fctr>
1 comp_ded    DIR
2 comp_ded     EA
3 comp_ded     IA
4  channel    500
5  channel    750
6  channel   1000
于 2018-04-05T21:57:28.913 回答
2

使用gtools,一个稍短的解决方案,它使用mixedorder

library(gtools)
sorteddf <- df[with(df, order(variable, mixedorder(level))),]

输出:

  variable level
1 channel  DIR  
2 channel  EA   
3 channel  IA   
4 comp_ded 500  
5 comp_ded 750  
6 comp_ded 1000
于 2018-04-05T20:53:39.650 回答
2

它有点难看,但您可以使用过滤器语句将数据框分成两部分,单独排列每个部分,然后将它们重新绑定在一起:

df <- bind_rows(df %>%
              filter(!is.na(as.numeric(level))) %>%
              arrange(variable, as.numeric(level)),
          df %>%
              filter(is.na(as.numeric(level))) %>%
              arrange(variable, level))

给你:

# A tibble: 6 x 2
  variable level
  <chr>    <chr>
1 comp_ded 500  
2 comp_ded 750  
3 comp_ded 1000 
4 channel  DIR  
5 channel  EA   
6 channel  IA   
于 2018-04-05T20:28:47.993 回答
1

您可以创建一个临时变量进行排序。按所需顺序排序后,您还可以通过转换为因子来永久设置顺序(如@Vio 的回答)。也许是这样的:

df = df %>% 
  mutate(tmp = as.numeric(level)) %>% 
  arrange(variable, tmp, level) %>% 
  select(-tmp) %>% 
  mutate(level = factor(level, levels=unique(level)))
  variable level
  <chr>    <fct>
1 channel  DIR  
2 channel  EA   
3 channel  IA   
4 comp_ded 500  
5 comp_ded 750  
6 comp_ded 1000

我认为您也可以通过不显式创建临时变量来缩短此时间,而是在内部使用“匿名”变量arrange

df = df %>% 
  arrange(variable, as.numeric(level), level) %>% 
  mutate(level = factor(level, levels=unique(level)))
于 2018-04-05T20:34:49.457 回答
1

转换为因子并更改级别。更容易forcats::fct_relevel()

# Convert to factor
df <- as_tibble(cbind(variable, level)) %>%
  mutate(level = as.factor(level))

# Change order of levels
levels(df$level) = levels(df$level)[match(c("DIR", "EA", "IA", "500", "750", "1000"), levels(df$level))]

df %>% arrange(level)

# A tibble: 6 x 2
  variable  level
     <chr> <fctr>
1 comp_ded    DIR
2 comp_ded     EA
3 comp_ded     IA
4  channel    500
5  channel    750
6  channel   1000
于 2018-04-05T20:35:44.803 回答
0

我认为首先排序要容易得多as.numeric(level),然后是level

df %>% arrange(variable, as.numeric(level), level)

给出:

# A tibble: 6 x 2
variable level
<chr>    <chr>
1 channel  DIR
2 channel  EA
3 channel  IA
4 comp_ded 500
5 comp_ded 750
6 comp_ded 1000 
于 2021-03-16T11:59:52.210 回答