5

我试图在 R 中提出一个正则表达式来匹配重复两个不同字符的字符串。

x <- c("aaaaaaah" ,"aaaah","ahhhh","cooee","helloee","mmmm","noooo","ohhhh","oooaaah","ooooh","sshh","ummmmm","vroomm","whoopee","yippee")

此正则表达式匹配上述所有内容,包括“mmmm”和“ohhhh”等字符串,其中重复的字母在第一次和第二次重复中是相同的:

grep(".*([a-z])\\1.*([a-z])\\2", x, value = T)

我想匹配的x是这些重复字母不同的字符串:

"cooee","helloee","oooaaah","sshh","vroomm","whoopee","yippee"

如何调整正则表达式以确保第二个重复字符与第一个不同?

4

2 回答 2

4

您可以使用负前瞻来限制第二个字符模式:

grep(".*([a-z])\\1.*(?!\\1)([a-z])\\2", x, value=TRUE, perl=TRUE)
#                    ^^^^^

请参阅正则表达式演示

(?!\\1)([a-z])表示如果与 Group 1 中的值不同,则将任何小写 ASCII 字母匹配并捕获到 Group 2 中

R 演示

x <- c("aaaaaaah" ,"aaaah","ahhhh","cooee","helloee","mmmm","noooo","ohhhh","oooaaah","ooooh","sshh","ummmmm","vroomm","whoopee","yippee")
grep(".*([a-z])\\1.*(?!\\1)([a-z])\\2", x, value=TRUE, perl=TRUE)
# => "cooee"   "helloee" "oooaaah" "sshh"    "vroomm"  "whoopee" "yippee" 
于 2020-06-24T09:19:24.510 回答
1

如果您可以完全避免正则表达式,那么我认为这就是要走的路。一个粗略的例子:

nrep <- sapply(
  strsplit(x, ""), 
  function(y) {
     run_lengths <- rle(y)
     length(unique(run_lengths$values[run_lengths$lengths >= 2]))
   }
)
x[nrep > 1]
# [1] "cooee"   "helloee" "oooaaah" "sshh"    "vroomm"  "whoopee" "yippee"
于 2020-06-24T09:29:58.737 回答