4

我有字符串和字符向量。我想从字符串的开始中找到尽可能多的字符匹配的字符向量中的所有字符串。例如:

s <- "abs"
vc <- c("ab","bb","abc","acbd","dert")

result <- c("ab","abc")

字符串 s 应该与前 K 个字符完全匹配。我希望尽可能匹配(最大 K<=length(s))。这里没有匹配 "abs" (grep("abs",vc)),但是对于 "ab" 有两个匹配 (result <-grep("ab",vc))。

4

3 回答 3

2

另一种解释:

s <- "abs"
# Updated vc
vc <- c("ab","bb","abc","acbd","dert","abwabsabs")

st <- strsplit(s, "")[[1]]
mtc <- sapply(strsplit(substr(vc, 1, nchar(s)), ""), 
              function(i) {
                m <- i == st[1:length(i)]
                sum(m * cumsum(m))})

vc[mtc == max(mtc)]
#[1] "ab"        "abc"       "abwabsabs"

# Another vector vc
vc <- c("ab","bb","abc","acbd","dert","absq","abab")
....
vc[mtc == max(mtc)]
#[1] "absq"

由于我们只考虑字符串的开头,在第一种情况下,最长的匹配是"ab",即使有"abwabsabs"which has "abs"

编辑:这是一个“单一模式”的解决方案,可能更简洁,但我们开始......

vc <- c("ab","bb","abc","acbd","dert","abwabsabs")
(auxOne <- sapply((nchar(s)-1):1, function(i) substr(s, 1, i)))
#[1] "ab"   "a"
(auxTwo <- sapply(nchar(s):2, function(i) substring(s, i)))
#[1] "s" "bs" 
l <- attr(regexpr(
  paste0("^((",s,")|",paste0("(",auxOne,"(?!",auxTwo,"))",collapse="|"),")"),
  vc, perl = TRUE), "match.length")
vc[l == max(l)]
#[1] "ab"        "abc"       "abwabsabs"
于 2012-12-04T13:00:12.647 回答
1

这是一个函数,它使用grep并检查给定的字符串是否s与 中的任何字符串的开头匹配vc,递归地从 的末尾删除一个字符s

myfun <- function(s, vc) {
  notDone <- TRUE
  maxChar <- max(nchar(vc))  # EDIT: these two lines truncate s to
  s <- substr(s, 1, maxChar) # the maximum number of chars in vc
  subN <- nchar(s)
  while(notDone & subN > 0){
    ss <- substr(s, 1, subN)
    ans <- grep(sprintf("^%s", ss), vc, val = TRUE)
    if(length(ans)) {
      notDone <- FALSE
    } else {
      subN <- subN - 1
    }
  }
  return(ans)
}

s <- "abs"
# Updated vc from @Julius's answer
vc <- c("ab","bb","abc","acbd","dert","absq","abab")

> myfun(s, vc)
[1] "absq"

# And there's no infinite recursion if there's no match
> myfun("q", "a")
character(0)
于 2012-12-04T14:18:46.420 回答
0

只是在很久之后,triebeard包现在已经存在了。对于查找最长或部分匹配项,它非常非常高效且用户友好。

于 2016-07-17T23:18:11.653 回答