8

假设我有一个支持文字、正负字符类、有序交替和贪婪量词的正则表达式语言?,*+. (这本质上是 PCRE 的一个子集,没有反向引用、环顾断言或其他一些更奇特的位。)添加非贪婪量词??,*?+?增加这种形式的表达能力吗?

换句话说,给定一个包含非贪婪量词的模式 S,该模式是否可以重写为不包含非贪婪量词的等效模式 T?

如果文献中已经考虑过这个问题,我将不胜感激任何人都可以提供的任何参考资料。我几乎没有找到关于扩展正则表达式形式的表达能力的理论工作(除了关于反向引用如何将你从常规语言转移到无上下文语法的常见事情)。

4

1 回答 1

2

When you say "Regex", you're refering to a several techniques - this isn't just an issue of the underlying theory. Consider the question "does this string match my given Regex?" For such a question, the notion of "greedy" is merely an implementation detail - if you're using one of the common (but inefficient) backtracking implementations, this can affect performance but not output. Similarly, the question "does this string contain a match?" is not affected by greedy vs. non-greedy quantifiers. This first type of regex is concerning with the abstract notion of set-membership: defining the language of matching strings.

So why do non-greedy quantifiers even exist? Regexes are not merely used for matching; common implementations can locate where the match is and which parts of the regex match which parts of the output. By doing this, a user depends on the intricacies of the implementation, which isn't trivial. This second type of regex is concerned with getting a few bits of text into a more practical representation withing the context of an otherwise turing-complete language.

Generally, when you talk of the strength of the regular expression formalism, you're talking about the first world - in which the computer answers with a simple yes or no. It's easy to talk about because the specification is clear. When you talk of a greedy vs. non-greedy quantifier, you're talking about the second world - used lots in practice, but with a specification that's mostly grown without much planning to solve real problems and is a standard by virtue of backwards compatibility. This second world is solving entirely different problems, and it's not clear to me what "expressive power" should even mean here. Certainly, non-greedy can be practical; and that's kind of the point...

Non-greedy quantifiers don't do anything for the first type of expressiveness, and they do for the second, although it's not quite clear what "expressive power" means here anyhow.

于 2011-07-20T19:28:16.607 回答