r - 如何替换字符串中的匹配项并索引每个匹配项

Question

一个特定的字符串可以包含我试图匹配的模式的多个实例。例如，如果我的模式是<N(.+?)N>并且我的字符串是"My name is <N Timon N> and his name is <N Pumba N>"，那么就有两个匹配项。我想用包含要替换匹配项的索引的替换项替换每个匹配项。

所以在我的字符串"My name is <N Timon N> and his name is <N Pumba N>"中，我想将字符串更改为读取"My name is [Name #1] and his name is [Name #2]"。

我该如何做到这一点，最好使用单个功能？最好使用stringror中的函数stringi？

score 3 · Accepted Answer

您可以使用gregexpr和regmatches在 Base R 中执行此操作：

my_string = "My name is <N Timon N> and his name is <N Pumba N>"

# Get the positions of the matches in the string
m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)

# Index each match and replace text using the indices
match_indices = 1:length(unlist(m))

regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))

结果：

> my_string
# [1] "My name is [Name #1] and his name is [Name #2]"

笔记：

如果多次出现，此解决方案会将相同的匹配视为不同的“名称”。例如以下：

my_string = "My name is <N Timon N> and his name is <N Pumba N>, <N Timon N> again"


m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)

match_indices = 1:length(unlist(m))

regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))

输出：

> my_string
[1] "My name is [Name #1] and his name is [Name #2], [Name #3] again"

score 2 · Accepted Answer

这是一个依赖于gsubfn和proto包的解决方案。

# Define the string to which the function will be applied
my_string <- "My name is <N Timon N> and his name is <N Pumba N>"

# Define the replacement function
replacement_fn <- function(x) {

  replacment_proto_fn <- proto::proto(fun = function(this, x) {
      paste0("[Name #", count, "]")
  })

  gsubfn::gsubfn(pattern = "<N(.+?)N>",
                 replacement = replacment_proto_fn,
                 x = x)
}

# Use the function on the string
replacement_fn(my_string)

score 1 · Accepted Answer

这是使用dplyr+的另一种方法stringr：

library(dplyr)
library(stringr)

string %>%
  str_extract_all("<N(.+?)N>") %>%
  unlist() %>%
  setNames(paste0("[Name #", 1:length(.), "]"), .) %>%
  str_replace_all(string, .)

# [1] "My name is [Name #1] and his name is [Name #2]"

笔记：

第二种解决方案使用提取匹配项str_extract_all，然后使用匹配项创建一个命名的替换向量，最后将其输入str_replace_all以进行相应的搜索和替换。

正如 OP 所指出的，在某些情况下，此解决方案会产生与gregexpr+方法不同的结果。regmatches例如以下：

string = "My name is <N Timon N> and his name is <N Pumba N>, <N Timon N> again"

string %>%
  str_extract_all("<N(.+?)N>") %>%
  unlist() %>%
  setNames(paste0("[Name #", 1:length(.), "]"), .) %>%
  str_replace_all(string, .)

输出：

[1] "My name is [Name #1] and his name is [Name #2], [Name #1] again"

score 0 · Accepted Answer

简单，也许很慢，但应该可以：

ct <- 1
while(TRUE) {
 old_string <- my_string; 
 my_string <- stri_replace_first_regex(my_string, '\\<N.*?N\\>', 
       paste0('[name', ct, ,']')); 
  if (old_string == my_string) break 
  ct <- ct + 1
}

r - 如何替换字符串中的匹配项并索引每个匹配项

4 回答 4

Related

Reference