regex - 用不同的替换顺序替换字符串中匹配单个模式的多个位置

Question

使用stringr包，很容易以向量化的方式执行正则表达式替换。

问题：我该如何执行以下操作：

替换每个单词

hello,world??your,make|[]world,hello,pos

不同的替代品，例如增加数量

1,2??3,4|[]5,6,7

请注意，不能假设简单的分隔符，实际用例更复杂。

stringr::str_replace_all似乎不起作用，因为它

str_replace_all(x, "(\\w+)", 1:7)

为应用于所有单词的每个替换生成一个向量，或者它具有不确定和/或重复的输入条目，因此

str_replace_all(x, c("hello" = "1", "world" = "2", ...))

将无法达到目的。

score 8 · Accepted Answer

这是另一个使用gsubfn. 该pre函数在替换之前运行，并且该fun函数针对每个替换运行：

library(gsubfn)
x <- "hello,world??your,make|[]world,hello,pos"
p <- proto(pre = function(t) t$v <- 0, # replace all matches by 0 
           fun = function(t, x) t$v <- v + 1) # increment 1 
gsubfn("\\w+", p, x)

这使：

[1] "1,2??3,4|[]5,6,7"

这种变化会给出相同的答案，因为 gsubfn 维护了一个count用于 proto 函数的变量：

pp <- proto(fun = function(...) count)
gsubfn("\\w+", pp, x)

有关使用count.

score 3 · Accepted Answer

我会建议这样的“矿石”包。特别值得注意的是ore.searchand ore.subst，后者可以接受一个函数作为替换值。

例子：

library(ore)

x <- "hello,world??your,make|[]world,hello,pos"

## Match all and replace with the sequence in which they are found
ore.subst("(\\w+)", function(i) seq_along(i), x, all = TRUE)
# [1] "1,2??3,4|[]5,6,7"

## Create a cool ore object with details about what was extracted
ore.search("(\\w+)", x, all = TRUE)
#   match: hello world  your make   world hello pos
# context:      ,     ??    ,    |[]     ,     ,   
#  number: 1==== 2====  3=== 4===   5==== 6==== 7==

score 1 · Accepted Answer

这里是一个基本的 R 解决方案。它也应该被矢量化。

x="hello,world??your,make|[]world,hello,pos"
#split x into single chars
x_split=strsplit(x,"")[[1]]
#find all char positions and replace them with "a"
x_split[gregexpr("\\w", x)[[1]]]="a"
#find all runs of "a"
rle_res=rle(x_split)
#replace run lengths by 1
rle_res$lengths[rle_res$values=="a"]=1
#replace run values by increasing number
rle_res$values[rle_res$values=="a"]=1:sum(rle_res$values=="a")
#use inverse.rle on the modified rle object and collapse string
paste0(inverse.rle(rle_res),collapse="")

#[1] "1,2??3,4|[]5,6,7"

regex - 用不同的替换顺序替换字符串中匹配单个模式的多个位置

3 回答 3

Related

Reference