r - data.frame 中的“选择性”连接？

Question

我有两个相同长度的向量。这是一个具有四行的简单示例：

[1] green  
[2] black, yellow  
[3] orange, white, purple  
[4] NA  

[1] red  
[2] black  
[3] NA  
[4] blue

第一个或第二个向量中可以有 NA，但在每一行中，它们中的至少一个总是有一个值。第一个向量可以包含一个或多个值，而第二个向量只能有一个。我想“有选择地”逐行连接这两个向量，输出将是这样的：

[1] green, red  
[2] black, yellow  
[3] orange, white, purple  
[4] blue

这意味着第一个向量的内容必须始终存在于输出中。如果第一个向量的一行中有 NA，它将被第二个向量的同一行中的值覆盖。
如果此值不在第一个向量的同一行中，则将添加第二个向量的内容。第二个向量中的 NA 将被忽略。

我试过了：

merge(A,B)
merge(A, B, all=TRUE)
merge(A, B, all.x=TRUE)
merge(A, B, all.y=TRUE)

但它们都产生完全不同的结果。

如上所述，我怎样才能实现这种“选择性”连接？

非常感谢您的考虑！

score 3 · Accepted Answer

你本质上是在尝试做一个“联合，然后扔掉任何 NA”，那么这个单线怎么样？

A = list( 'green', c('black', 'yellow'), c('orange', 'white', 'purple'), NA)                                                             

B = list( 'red', 'black', NA, 'blue')    

> sapply(mapply(union, A,B), setdiff, NA)                                                                                               
 [[1]]                                                                                                                                   
 [1] "green" "red"                                                                                                                       

 [[2]]                                                                                                                                   
 [1] "black"  "yellow"                                                                                                                   

 [[3]]                                                                                                                                   
 [1] "orange" "white"  "purple"                                                                                                          

 [[4]]                                                                                                                                   
 [1] "blue"

score 2 · Accepted Answer

I'm not sure how you have this data input into a data.frame but if you put the data into 2 lists then I could see a method for doing it. Here is my attempt (with credit to the comment suggestions below):

# get the data
a <- c("green","black, yellow","orange, white, purple",NA)
b <- c("red","black",NA,"blue");

# strip any spaces first
a <- gsub("[[:space:]]+","",a)
b <- gsub("[[:space:]]+","",b)

# convert to lists
alist <- strsplit(a,",")
blist <- strsplit(b,",")

# join the lists
abjoin <- mapply(c,alist,blist)
# remove any duplicates and NA's
abjoin <- lapply(abjoin,function(x) (unique(x[complete.cases(x)])))

# result
> abjoin
[[1]]
[1] "green" "red"  

[[2]]
[1] "black"  "yellow"

[[3]]
[1] "orange" "white"  "purple"

[[4]]
[1] "blue"

And to convert into a vector with each colour set split by commas:

sapply(abjoin,paste,collapse=",")
#[1] "green,red"           "black,yellow"        "orange,white,purple"
#[4] "blue"

r - data.frame 中的“选择性”连接？

2 回答 2

Related

Reference