1

我有来自各种来源的调查数据。大多数是不同水平的因子变量。合并时,这意味着存在相同长度的变量,每个变量都包含许多带有信息的行,而其他行是NA. 因此,在合并完整 df 中的每一行时,应该在其中包含信息,同时忽略NA's 并保持相同的长度。

我已经尝试过这个包,因为它包含操纵不同因子水平的函数,但我还没有找到一个解决方案,可以满足在将不同因子与其相应水平合并的同时forcats去除's。NA

v1 <- as.factor(c("a","b","c","x","x",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
v2<- as.factor(c(NA,NA,NA,NA,NA,"c","c","c","b","a",NA,NA,NA,NA,NA))
v3<- as.factor(c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"f","c","c","b","a"))
df<- data.frame(v1,v2,v3)

合并变量应该看起来像一个包含以下内容的因子:

("a","b","c","x","x","c","c","c","b","a","f","c","c","b","a")
4

4 回答 4

1
library(magrittr)

lapply(df, function(x){
  x[!is.na(x)] %>%
    t %>%
    as.character
  }) %>%
  unlist %>%
  as.factor %>%
  `names<-`(NULL)

 [1] a b c x x c c c b a f c c b a
Levels: a b c f x
于 2019-08-07T20:21:14.307 回答
1
library(tidyverse)

map(df, ~na.omit(.x)) %>% unlist %>% unname
 [1] a b c x x c c c b a f c c b a
Levels: a b c x f
于 2019-08-07T22:38:52.947 回答
1

我们可以用coalesce

library(dplyr)
df %>% 
   transmute(v = coalesce(!!! .)) %>% 
   pull(v)
#[1] "a" "b" "c" "x" "x" "c" "c" "c" "b" "a" "f" "c" "c" "b" "a"

或者更紧凑

library(purrr)
reduce(df, coalesce)
#[1] "a" "b" "c" "x" "x" "c" "c" "c" "b" "a" "f" "c" "c" "b" "a"

或在base R

do.call(pmin, c(lapply(df, as.character), na.rm = TRUE))
#[1] "a" "b" "c" "x" "x" "c" "c" "c" "b" "a" "f" "c" "c" "b" "a"
于 2019-08-08T02:37:25.973 回答
1

在基数 R 中,我们可以使用unlistthenFilter来省略NA值。

Filter(function(x) !is.na(x) , unlist(df, use.names = FALSE))
#[1] a b c x x c c c b a f c c b a
#Levels: a b c x f
于 2019-08-08T01:07:26.250 回答