r - 按字母顺序粘贴两个向量的元素

Question

假设我有两个向量：

a <- c("george", "harry", "harry", "chris", "steve", "steve", "steve", "harry")
b <- c("harry", "steve", "chris", "harry", "harry", "george", "chris", "george")

我想要做的是将第一对，第二对等粘贴在一起......但是，我想按字母顺序粘贴每对的两个元素。在上面的示例中，前 2 对已经按字母顺序排列，但第 3 对 'harry' 和 'chris' 不是。我想为这对返回“chris harry”。

我已经想出了如何在两步过程中做到这一点，但想知道是否有一种快速的方法（单线方式）来做到这一点只是使用paste？

我的解决方案：

x <- apply(mapply(c, a, b, USE.NAMES = FALSE), 2, sort)
paste(x[1,], x[2,])

它按字母顺序给出了对...但是有 1 行方式吗？

[1] "george harry" "harry steve"  "chris harry"  "chris harry"  "harry steve"  "george steve" "chris steve"  "george harry"

score 6 · Accepted Answer

有点多余，因为它排序了两次，但是矢量化了，

paste(pmin(a,b), pmax(a,b))

ifelse编辑：用,替代

ifelse(a < b, paste(a, b), paste(b, a))

score 5 · Accepted Answer

这是一种方法：

apply(cbind(a, b), 1, function(x) paste(sort(x), collapse=" "))

## [1] "george harry" "harry steve"  "chris harry"  "chris harry"  
## [5] "harry steve" "george steve" "chris steve"  "george harry"

使用您的初始尝试，您还可以执行以下操作，但它们都需要更多输入（不确定速度）：

unlist(Map(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b),,FALSE)
mapply(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b, USE.NAMES = FALSE)

score 1 · Accepted Answer

这是与 Tyler 类似的方法，但使用Map. 从技术上讲，它是一个单行...

unlist(Map(function(x,y) {
    paste(sort(c(x,y)), collapse = " ")
    }, a, b, USE.NAMES = FALSE))
# [1] "george harry" "harry steve"  "chris harry"  "chris harry" 
# [5] "harry steve"  "george steve" "chris steve"  "george harry"

score 1 · Accepted Answer

来自您自己的代码的一个班轮：

apply(data.frame(apply(mapply(c, a, b, USE.NAMES = FALSE),1,paste)),1,function(x) paste(x[1],x[2]))
[1] "george harry" "harry steve"  "harry chris"  "chris harry"  "steve harry"  "steve george" "steve chris"  "harry george"


apply(apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort),1,paste)

     [,1]     [,2]   
[1,] "george" "harry"
[2,] "harry"  "steve"
[3,] "chris"  "harry"
[4,] "chris"  "harry"
[5,] "harry"  "steve"
[6,] "george" "steve"
[7,] "chris"  "steve"
[8,] "george" "harry"

score 1 · Accepted Answer

这是上述答案的速度比较...

我从我自己的数据集中获取了在足球联赛前 4 名中进行的所有英格兰足球比赛的数据，可在此处获取： https ://github.com/jalapic/engsoccerdata

数据集是“engsoccerdata”，我使用第 3 列和第 4 列（主队和客队）粘贴在一起。我将每一列转换为一个字符向量。每个向量有 188,060 个元素 - 从 1888 年到 2014 年，英格兰足球排名前 4 位的足球比赛有 188,060 场。

这是比较：

df<-engsoccerdata

a<-as.character(df[,3])
b<-as.character(df[,4])

#tyler1
system.time(apply(cbind(a, b), 1, function(x) paste(sort(x), collapse=" ")))

#tyler2
unlist(Map(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b),,FALSE)

#tyler3
mapply(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b, USE.NAMES = FALSE)

#baptiste1
paste(pmin(a,b), pmax(a,b))

#baptiste2
ifelse(a < b, paste(a, b), paste(b, a))  

#RichardS
unlist(Map(function(x,y) {
  paste(sort(c(x,y)), collapse = " ")
}, a, b, USE.NAMES = FALSE))


#rnso1
apply(data.frame(apply(mapply(c, a, b, USE.NAMES = FALSE),1,paste)),1,function(x) paste(x[1],x[2]))

#rnso2
apply(apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort),1,paste)

system.time() 结果：

#              user  system elapsed 
#tyler1       42.92    0.02   43.73 
#tyler2       14.68    0.03   15.04
#tyler3       14.78    0.00   14.88 
#baptiste1     0.79    0.00    0.84 
#baptiste2     1.25    0.00    1.28 
#RichardS     15.40    0.01   15.64
#rnso1         6.22    0.10    6.41
#rnso2        13.07    0.00   13.15

很有意思。巴蒂斯特的方法快如闪电！

r - 按字母顺序粘贴两个向量的元素

5 回答 5

Related

Reference