3

我有两个数据框

df1 = data.frame(Sites=c("A","B","C"),total=c(12,6,35))

df2 = data.frame(Site.1=c("A","A","B"),Site.2=c("B","C","C"), Score=c(60,70,80))

我需要合并它们以生成数据框

df3=data.frame(Site.1=c("A","A","B"),Site.2=c("B","C","C"),
Score=c(60,70,80),Site.1.total=c(12,12,6),Site.2.total=c(6,35,35))

关于进行这种双重合并的最简单方法有什么建议吗?谢谢

4

2 回答 2

4

只需merge两次:

x <- merge(df2, df1, all.x=TRUE, by.x="Site.2", by.y="Sites", sort=FALSE)
merge(x, df1, all.x=TRUE, by.x="Site.1", by.y="Sites", sort=FALSE)

  Site.1 Site.2 Score total.x total.y
1      A      B    60       6      12
2      A      C    70      35      12
3      B      C    80      35       6
于 2012-07-12T08:58:04.943 回答
1

这里有几个 sqldf 解决方案。

首先让我们重命名名称中包含点的列以删除点,因为点是 SQL 运算符。(如果我们不希望这样做,我们可以将 SQL 语句中的那些列称为Site_1andSite_2并且它会理解我们指的是Site.1and Site.2。)

library(sqldf)
df1 = data.frame(Sites = c("A","B","C"), total = c(12,6,35))
df2 = data.frame(Site1 = c("A","A","B"), Site2 = c("B","C","C"), 
           Score = c(60,70,80))

现在我们有了输入,让我们尝试使用 sqldf 的几种方法:

带有三个 sql 语句的 sqldf

temp1 <- sqldf("SELECT * FROM df1 as a, df2 as b WHERE a.Sites = b.Site1 ")  
temp2 <- sqldf("SELECT * FROM df1 as a, df2 as b WHERE a.Sites = b.Site2 ")

sqldf("SELECT 
    Site1,
    b.Site2,
    a.Score, 
    a.Total as Site1Total, 
    b.Total as Site2Total 
FROM temp1 as a,  temp2 as b 
USING (Site1)
GROUP BY a.Total, b.Total")

sqldf 简化为三重连接

我们可以进一步将上述内容简化为三重连接,这或许可以阐明计算的本质。也就是上面的三个 SQL 语句可以简化为这一条语句:

> sqldf("SELECT Site1, Site2, Score, a1.total AS total1, a2.total AS total2
+ FROM df1 AS a1, df1 a2, df2 AS b
+ WHERE a1.Sites = Site1 AND a2.Sites = Site2")
  Site1 Site2 Score total1 total2
1      A      B    60      12       6
2      A      C    70      12      35
3      B      C    80       6      35
于 2012-07-12T09:12:06.943 回答