r - 为什么 R 中的 agrep 找不到最佳匹配？

Question

我正在尝试使用 agrep 命令在 R 中进行字符串匹配。但是我担心它会在找到一个好的匹配时停止，而不是优化以找到最好的匹配。尽管我对它的工作原理的理解可能不正确。我下面的例子重现了这个问题，尽管很粗糙。

example1 <- c("height","weight")
example2 <- c("height","weight")

y <- c("","")
for( i in 1: 2 ){
x <- agrep(example1[i], example2, max.distance = 1, ignore.case=TRUE, value=TRUE, useBytes=TRUE ) 
x <- paste0(x,"")
y[i] <- x
  }

正如您所希望看到的，agrep 已将体重与身高相匹配，此时体重是更好的匹配并且也存在。

为什么是这样？

score 1 · Accepted Answer

您可以尝试 adist （用于广义 Levenshtein（编辑）距离），结果如下（示例 1 中的“高度”与示例 2 中的高度最佳匹配等）：

adist(example1, example2)
     [,1] [,2]
[1,]    0    1
[2,]    1    0

example2[apply(adist(example1, example2), 1, which.min)]
# [1] "height" "weight"

r - 为什么 R 中的 agrep 找不到最佳匹配？

1 回答 1

Related

Reference