regex - （简化的）正则表达式匹配的复杂性

Question

我只是想知道这个正则表达式匹配问题的复杂性：给定一个小写字母字符串和一个匹配规则，确定该规则是否可以匹配整个字符串。该规则是一个简化的正则表达式，仅包含较小的字母和/或“。” （句点）和/或“*”（星号）。句点可以匹配任何小写字母，其中星号可以匹配零个或多个前面的元素。

这里有些例子：

isMatch("aa","a") 为假
isMatch("aa","aa") 为真
isMatch("aaa","aa") 为假
isMatch("aa", "a*") 为真
isMatch("aa", ".*") 为真
isMatch("ab", ".*") 为真
isMatch("aab", "c*a*b") 为真

据说这个问题可以在多项式时间内解决。我只是想知道如何。凭直觉，将“aaaaaaaaaa”与“.*a.*”之类的正则表达式匹配会使与有限确定性机器匹配时很难确定状态转换。任何意见？

谢谢你。

score 0 · Accepted Answer

You can solve this in polynomial time by using a dynamic programming algorithm. The idea is to answer queries of the following form:

Can you match the last m characters of the string using the last n characters of the regular expression?

The idea is to use a recursive algorithm, then either memoize the results or use dynamic programming to cache the results. The recursive algorithm works as follows:

If the regular expression is empty, it only matches the empty string.
If the regular expression's second character isn't *, then the regular expression matches the string iff the first character of the string matches the regex and the rest of the string matches the rest of the regex.
If the regular expression's second character is *, then the regular expression matches the string iff one of the following is true:
- The first character of the regular expression matches the first character of the string, and the same regular expression matches the remainder of the string.
- The first character of the regular expression matches the first character of the string, and the regular expression with the *'ed expression removed matches the rest of the string.
- The first character of the regular expression doesn't match the first character of the string, but the regex formed by removing the *'ed expression matches the string.

Each of these cases makes a recursive call where either the string or the regex is shorter and does O(1) work aside from the recursive calls. Since there are only Θ(mn) possible subproblems (one for each combination of a suffix of the regex and a suffix of the original string), using memoization this problem can be solved in Θ(mn) time.

Hope this helps!

regex - （简化的）正则表达式匹配的复杂性

1 回答 1

Related

Reference