r - agrep：只返回最佳匹配

Question

我在 R 中使用 'agrep' 函数，它返回一个匹配向量。我想要一个类似于 agrep 的函数，它只返回最佳匹配，或者如果有平局则返回最佳匹配。目前，我在结果向量的每个元素上使用包“cba”中的“sdist()”函数来执行此操作，但这似乎非常多余。

/edit：这是我目前正在使用的功能。我想加快速度，因为两次计算距离似乎是多余的。

library(cba)
word <- 'test'
words <- c('Teest','teeeest','New York City','yeast','text','Test')
ClosestMatch <- function(string,StringVector) {
  matches <- agrep(string,StringVector,value=TRUE)
  distance <- sdists(string,matches,method = "ow",weight = c(1, 0, 2))
  matches <- data.frame(matches,as.numeric(distance))
  matches <- subset(matches,distance==min(distance))
  as.character(matches$matches)
}

ClosestMatch(word,words)

score 29 · Accepted Answer

agrep 包使用 Levenshtein 距离来匹配字符串。包 RecordLinkage 有一个 C 函数来计算 Levenshtein 距离，它可以直接用来加速你的计算。这是一个重新设计的ClosestMatch函数，速度提高了大约 10 倍

library(RecordLinkage)

ClosestMatch2 = function(string, stringVector){

  distance = levenshteinSim(string, stringVector);
  stringVector[distance == max(distance)]

}

score 14 · Accepted Answer

RecordLinkage 包已从 CRAN 中删除，请改用 stringdist：

library(stringdist)

ClosestMatch2 = function(string, stringVector){

  stringVector[amatch(string, stringVector, maxDist=Inf)]

}

r - agrep：只返回最佳匹配

2 回答 2

Related

Reference