r - 如何在 R 中做一种混合值

Question

我有一个数据框，我想按一列而不是下一列进行排序（如果可能，使用 tidyverse）。

我检查了以下地址，但解决方案似乎不起作用。

示例代码：

variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))

这并没有给我想要的东西：

df <- df %>% arrange(variable, level)

级别列的顺序如下：

variable level channel DIR channel EA channel IA level 1000 level 500 level 750

我需要他们：

variable level channel DIR channel EA channel IA level 500 level 750 level 1000

真实数据集中有多个不同的“变量”，其中一半需要按数字顺序排序，一半需要按字母顺序排序。有谁知道如何做到这一点？

score 3 · Accepted Answer

最简单的解决方案是使用dplyr::group_by.

library(dplyr)

variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))

df %>%
  group_by(variable, level) %>%
  arrange()

# A tibble: 6 x 2
  variable  level
     <chr> <fctr>
1 comp_ded    DIR
2 comp_ded     EA
3 comp_ded     IA
4  channel    500
5  channel    750
6  channel   1000

score 2 · Accepted Answer

使用gtools，一个稍短的解决方案，它使用mixedorder：

library(gtools)
sorteddf <- df[with(df, order(variable, mixedorder(level))),]

输出：

  variable level
1 channel  DIR  
2 channel  EA   
3 channel  IA   
4 comp_ded 500  
5 comp_ded 750  
6 comp_ded 1000

score 2 · Accepted Answer

它有点难看，但您可以使用过滤器语句将数据框分成两部分，单独排列每个部分，然后将它们重新绑定在一起：

df <- bind_rows(df %>%
              filter(!is.na(as.numeric(level))) %>%
              arrange(variable, as.numeric(level)),
          df %>%
              filter(is.na(as.numeric(level))) %>%
              arrange(variable, level))

给你：

# A tibble: 6 x 2
  variable level
  <chr>    <chr>
1 comp_ded 500  
2 comp_ded 750  
3 comp_ded 1000 
4 channel  DIR  
5 channel  EA   
6 channel  IA

score 1 · Accepted Answer

您可以创建一个临时变量进行排序。按所需顺序排序后，您还可以通过转换为因子来永久设置顺序（如@Vio 的回答）。也许是这样的：

df = df %>% 
  mutate(tmp = as.numeric(level)) %>% 
  arrange(variable, tmp, level) %>% 
  select(-tmp) %>% 
  mutate(level = factor(level, levels=unique(level)))

  variable level
  <chr>    <fct>
1 channel  DIR  
2 channel  EA   
3 channel  IA   
4 comp_ded 500  
5 comp_ded 750  
6 comp_ded 1000

我认为您也可以通过不显式创建临时变量来缩短此时间，而是在内部使用“匿名”变量arrange：

df = df %>% 
  arrange(variable, as.numeric(level), level) %>% 
  mutate(level = factor(level, levels=unique(level)))

score 1 · Accepted Answer

转换为因子并更改级别。更容易forcats::fct_relevel()

# Convert to factor
df <- as_tibble(cbind(variable, level)) %>%
  mutate(level = as.factor(level))

# Change order of levels
levels(df$level) = levels(df$level)[match(c("DIR", "EA", "IA", "500", "750", "1000"), levels(df$level))]

df %>% arrange(level)

# A tibble: 6 x 2
  variable  level
     <chr> <fctr>
1 comp_ded    DIR
2 comp_ded     EA
3 comp_ded     IA
4  channel    500
5  channel    750
6  channel   1000

score 0 · Accepted Answer

我认为首先排序要容易得多as.numeric(level)，然后是level：

df %>% arrange(variable, as.numeric(level), level)

给出：

# A tibble: 6 x 2
variable level
<chr>    <chr>
1 channel  DIR
2 channel  EA
3 channel  IA
4 comp_ded 500
5 comp_ded 750
6 comp_ded 1000

r - 如何在 R 中做一种混合值

6 回答 6

Related

Reference