ruby - 如何匹配不在两个特殊字符之间的正则表达式？

Question

我有一个这样的字符串：

abcab “ab” ba “a”

如何匹配a不属于由分隔的字符串的所有内容"？我想匹配这里粗体的所有内容：

a bc a b " ab " b a " a "

我想替换这些匹配项（或者通过用空字符串替换它们来删除它们），因此删除带引号的部分进行匹配将不起作用，因为我希望它们保留在字符串中。我正在使用红宝石。

score 25 · Accepted Answer

假设引号正确平衡并且没有转义引号，那么很容易：

result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')

a当且仅当匹配的 s 前面有偶数个引号时，这会将所有 s 替换为空字符串a。

解释：

a        # Match a
(?=      # only if it's followed by...
 (?:     # ...the following:
  [^"]*" #  any number of non-quotes, followed by one quote
  [^"]*" #  the same again, ensuring an even number
 )*      # any number of times (0, 2, 4 etc. quotes)
 [^"]*   # followed by only non-quotes until
 \Z      # the end of the string.
)        # End of lookahead assertion

如果您可以在引号 ( a "length: 2\"") 中转义引号，它仍然是可能的，但会更复杂：

result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')

这本质上与上面的正则表达式相同，只是替换(?:\\.|[^"\\])为[^"]：

(?:     # Match either...
 \\.    # an escaped character
|       # or
 [^"\\] # any character except backslash or quote
)       # End of alternation

score 10 · Accepted Answer

js-coder，复活了这个古老的问题，因为它有一个没有提到的简单解决方案。（在对正则表达式赏金任务进行一些研究时发现了您的问题。）

如您所见，与已接受答案中的正则表达式相比，正则表达式非常小：("[^"]*")|a

subject = 'a b c a b " a b " b a " a "'
regex = /("[^"]*")|a/
replaced = subject.gsub(regex) {|m|$1}
puts replaced

看这个现场演示

参考

除了情况 s1、s2、s3 之外，如何匹配模式

如何匹配模式，除非...

score 0 · Accepted Answer

正则表达式爱好者的成熟正则表达式解决方案，无需关心性能或代码可读性。

此解决方案假定没有转义语法（使用转义语法，ain"sbd\"a"被计为字符串内部）。

伪代码：

processedString = 
    inputString.replaceAll("\\".*?\\"","") // Remove all quoted strings
               .replaceFirst("\\".*", "") // Consider text after lonely quote as inside quote

然后你可以匹配你想要的文本在processedString. 如果您将单引号后的文本视为外引号，则可以删除第二个替换。

编辑

在 Ruby 中，上面代码中的正则表达式是

/\".*?\"/

与gsub

和

/\".*/

与sub

为了解决更换问题，我不确定这是否可行，但值得尝试：

声明一个计数器
将正则表达式/(\"|a)/与 gsub 和 supply 函数一起使用。
在函数中，如果 match 是"，则递增计数器，并"作为替换返回（基本上没有变化）。如果匹配是a检查计数器是否是偶数：如果甚至提供您的替换字符串；否则，只需提供匹配的任何内容。

ruby - 如何匹配不在两个特殊字符之间的正则表达式？

3 回答 3

Related

Reference