r - 如何在 R 中的任何匹配子句上 LEFT JOIN？

Question

你能帮我解决这个问题吗：

我有一个数据框 ( df1)，其中包含网站 CMS 中发布的所有文章的索引。有一列当前URLURL 和一列原始 URL，以防它们在发布后被更改（列名称Origin）：

网址	起源	文章编号	作者	类别	成本
https://example.com/article1	https://example.com/article	001	作者姓名	政治	120 美元
https://example.com/article2	https://example.com/article2	002	作者姓名	金融	68 美元

接下来，我有一个巨大的数据框 ( df2)，其中包含一段时间内的 Web 分析导出。它有一个日期，只有 1 列 URL 和浏览量。

浏览量日期	网址	浏览量
2019-01-01	https://example.com/article	224544
2019-01-01	https://example.com/article1	656565

我如何离开加入第一个数据帧但匹配URL= URLOR Origin=URL

所以最终结果将如下所示：

浏览量日期	浏览量	文章编号	作者	类别
2019-01-01	881109	001	作者姓名	政治

即881109是加起来的结果224544，656565两者都与同一篇文章有关

我想我正在寻找的是相当于 SQL 语法，如：

LEFT JOIN ...`enter code here`
ON URL = URL
OR Origin = URL```

score 2 · Accepted Answer

您可以获取df1长格式的数据帧 1 ( )，以便两者Origin和URL在同一列中，然后与第二个数据帧 ( df2) 执行连接。

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(cols = c(URL, Origin), values_to = 'URL') %>%
  inner_join(df2, by = 'URL') %>%
  select(-name)

#  ArticleID Author     Category name   URL                          PageviewDate Pageviews
#      <int> <chr>      <chr>    <chr>  <chr>                        <chr>            <int>
#1         1 AuthorName Politics URL    https://example.com/article1 2019-01-01      656565
#2         1 AuthorName Politics Origin https://example.com/article  2019-01-01      224544

数据

df1 <- structure(list(URL = c("https://example.com/article1", "https://example.com/article2"
), Origin = c("https://example.com/article", "https://example.com/article2"
), ArticleID = 1:2, Author = c("AuthorName", "AuthorName"), 
Category = c("Politics", "Finance")), class = "data.frame",row.names =c(NA, -2L))


df2 <- structure(list(PageviewDate = c("2019-01-01", "2019-01-01"), 
    URL = c("https://example.com/article", "https://example.com/article1"), 
Pageviews = c(224544L, 656565L)), class = "data.frame", row.names = c(NA, -2L))

r - 如何在 R 中的任何匹配子句上 LEFT JOIN？

1 回答 1

Related

Reference