这里有几个 sqldf 解决方案。
首先让我们重命名名称中包含点的列以删除点,因为点是 SQL 运算符。(如果我们不希望这样做,我们可以将 SQL 语句中的那些列称为Site_1
andSite_2
并且它会理解我们指的是Site.1
and Site.2
。)
library(sqldf)
df1 = data.frame(Sites = c("A","B","C"), total = c(12,6,35))
df2 = data.frame(Site1 = c("A","A","B"), Site2 = c("B","C","C"),
Score = c(60,70,80))
现在我们有了输入,让我们尝试使用 sqldf 的几种方法:
带有三个 sql 语句的 sqldf
temp1 <- sqldf("SELECT * FROM df1 as a, df2 as b WHERE a.Sites = b.Site1 ")
temp2 <- sqldf("SELECT * FROM df1 as a, df2 as b WHERE a.Sites = b.Site2 ")
sqldf("SELECT
Site1,
b.Site2,
a.Score,
a.Total as Site1Total,
b.Total as Site2Total
FROM temp1 as a, temp2 as b
USING (Site1)
GROUP BY a.Total, b.Total")
sqldf 简化为三重连接
我们可以进一步将上述内容简化为三重连接,这或许可以阐明计算的本质。也就是上面的三个 SQL 语句可以简化为这一条语句:
> sqldf("SELECT Site1, Site2, Score, a1.total AS total1, a2.total AS total2
+ FROM df1 AS a1, df1 a2, df2 AS b
+ WHERE a1.Sites = Site1 AND a2.Sites = Site2")
Site1 Site2 Score total1 total2
1 A B 60 12 6
2 A C 70 12 35
3 B C 80 6 35