r - 将单词（字符）与 R 中的参考值匹配

Question

这是我的数据（A）。

    keyword
[1] shoes
[2] childrenshoes
[3] nikeshoes
[4] sportsshiirts
[5] nikeshirts
[6] shirts
...

此外，这是另一个数据（B）。这是一个参考数据。

   keyword  value
[1] shoes    1
[2] shirts   2
...

我需要匹配这个数据集。

所以，我想要那个结果。

    keyword        vlaue
[1] shoes          1
[2] childrenshoes  1     (because, this keyword include the 'shoes')
[3] nikeshoes      1     (because, this keyword include the 'shoes')
[4] sportsshiirts  2     (because, this keyword include the 'shirts')
[5] nikeshirts     2     (because, this keyword include the 'shirts')
[6] shirts         2
...

如果我使用“合并”，我会不匹配这个数据集。这是因为 data(B) 中的关键字与 data(A) 中的数据不完全匹配。

我可以通过使用 regexpr() 或 gregexpr() 来一一处理。但是，我在数据中有很多参考（B）

那么，我该如何处理这个问题呢？

score 6 · Accepted Answer

这是一种方法：

首先，您的数据：

temp <- c("shoes", "childrenshoes", "nikeshoes", 
          "sportsshiirts", "nikeshirts", "shirts")

matchme <- structure(list(keyword = c("shoes", "shirts"), value = 1:2), 
                     .Names = c("keyword", "value"), 
                     class = "data.frame", row.names = c(NA, -2L))

其次，输出，一气呵成：

data.frame(
  keyword = temp, 
  value = rowSums(sapply(seq_along(matchme[[1]]), function(x) {
    temp[grepl(matchme[x, 1], temp)] <- matchme[x, 2]
    suppressWarnings(as.numeric(temp))
  }), na.rm = TRUE))
#         keyword value
# 1         shoes     1
# 2 childrenshoes     1
# 3     nikeshoes     1
# 4 sportsshiirts     0
# 5    nikeshirts     2
# 6        shirts     2

grepl对“matchme”data.frame中的每个元素与源“temp”进行逻辑匹配data.frame。如果找到匹配项，它会从 "matchme" 的 "value" 列中提取值data.frame。否则，它保持原始值。这很好，因为我们知道当我们使用转换结果向量时as.numeric，任何不能强制转换为数字的东西都会变成NA。

在该sapply步骤中，您将获得一个矩阵。如果我们可以假设每个项目只会有一个匹配项，那么我们可以安全地使用rowSums参数na.rm = TRUE来将该矩阵“折叠”成一个向量，该向量可以与我们的“临时”数据组合以创建结果data.frame。

我suppressWarnings在那里添加了一个，因为我知道我会收到很多NAs introduced by coercion警告，这些警告不会告诉我任何我不知道的事情。

请注意0“运动衫”。如果您需要近似匹配，您可能需要查看agrep并查看是否可以修改此方法。

r - 将单词（字符）与 R 中的参考值匹配

1 回答 1

Related

Reference