r - 避免字符串替换中的for循环？

Question

我有数据、一个字符向量（最终我会折叠它，所以我不在乎它是保持向量还是被视为单个字符串）、一个模式向量和一个替换向量。我希望数据中的每个模式都被其各自的替换所替换。我用一个stringr和一个 for 循环完成了它，但是有没有更像 R 的方法呢？

require(stringr)
start_string <- sample(letters[1:10], 10)
my_pattern <- c("a", "b", "c", "z")
my_replacement <- c("[this was an a]", "[this was a b]", "[this was a c]", "[no z!]")
str_replace(start_string, pattern = my_pattern, replacement = my_replacement)
# bad lengths, doesn't work

str_replace(paste0(start_string, collapse = ""),
    pattern = my_pattern, replacement = my_replacement)
# vector output, not what I want in this case

my_result <- start_string
for (i in 1:length(my_pattern)) {
    my_result <- str_replace(my_result,
        pattern = my_pattern[i], replacement = my_replacement[i])
}
> my_result
 [1] "[this was a c]"  "[this was an a]" "e"               "g"               "h"               "[this was a b]" 
 [7] "d"               "j"               "f"               "i"   

# This is what I want, but is there a better way?

就我而言，我知道每种模式最多会出现一次，但并非每种模式都会出现。我知道str_replace_all如果模式可能不止一次出现，我可以使用；我希望解决方案也能提供该选项。我还想要一个使用my_patternand的解决方案，my_replacement以便它可以成为以这些向量作为参数的函数的一部分。

score 3 · Accepted Answer

我敢打赌还有另一种方法可以做到这一点，但我的第一个想法是gsubfn：

my_repl <- function(x){
    switch(x,a = "[this was an a]",
             b = "[this was a b]",
             c = "[this was a c]",
             z = "[this was a z]")
}

library(gsubfn)    
start_string <- sample(letters[1:10], 10)
gsubfn("a|b|c|z",my_repl,x = start_string)

如果您正在为列表元素搜索可接受的有效名称的模式，这也将起作用：

names(my_replacement) <- my_pattern
gsubfn("a|b|c|z",as.list(my_replacement),start_string)

编辑

但坦率地说，如果我真的必须在我自己的代码中做很多事情，我可能只会做for循环的事情，包装在一个函数中。sub这是一个使用andgsub而不是stringr中的函数的简单版本：

vsub <- function(pattern,replacement,x,all = TRUE,...){
  FUN <- if (all) gsub else sub
  for (i in seq_len(min(length(pattern),length(replacement)))){
    x <- FUN(pattern = pattern[i],replacement = replacement[i],x,...)
  }
  x
}

vsub(my_pattern,my_replacement,start_string)

但是，当然，没有众所周知的内置函数的原因之一可能是像这样的顺序替换不能非常脆弱，因为它们是如此依赖于顺序：

vsub(rev(my_pattern),rev(my_replacement),start_string)
 [1] "i"                                          "[this w[this was an a]s [this was an a] c]"
 [3] "[this was an a]"                            "g"                                         
 [5] "j"                                          "d"                                         
 [7] "f"                                          "[this w[this was an a]s [this was an a] b]"
 [9] "h"                                          "e"

score 1 · Accepted Answer

这是一个基于gregrexpr、regmatches和的选项regmatches<-。请注意，可以匹配的正则表达式的长度是有限制的，因此如果您尝试匹配太多长模式，这将不起作用。

replaceSubstrings <- function(patterns, replacements, X) {
    pat <- paste(patterns, collapse="|")
    m <- gregexpr(pat, X)
    regmatches(X, m) <- 
        lapply(regmatches(X,m),
               function(XX) replacements[match(XX, patterns)])
    X
}

## Try it out
patterns <- c("cat", "dog")
replacements <- c("tiger", "coyote")
sentences <- c("A cat", "Two dogs", "Raining cats and dogs")
replaceSubstrings(patterns, replacements, sentences)
## [1] "A tiger"                    "Two coyotes"               
## [3] "Raining tigers and coyotes"

r - 避免字符串替换中的for循环？

2 回答 2

Related

Reference