2

I have a regular expression, read from an XML, that is being used from two different tools. A Java one and a C++ one.

[…!\?\.](\)|\]|“|'|"|’|”|‘|´|''|»)*

Trying to match the following string:

!!!!''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''??

The input data comes from some "big data" stored on HDFS.

In Java, it goes on backtracking forever, while in the C++ version it goes fine. The problem is that I cannot change the regular expression, since it is used by other external modules too, and it's hard to motivate a change since it works fine from C++.

Is there a way I could avoid this issue by not changing the regex? I tried appending a "$" after it with no luck.

4

1 回答 1

1

问题与正则表达式同时具有“'”和“''”(一个撇号或两个撇号)这一事实有关,解决此问题的简单方法是消除额外的“|''”(2个撇号)为它已经在寻找一个 ("|' ") 并且它有一组 ()* (因此括号内的所有内容无论如何都会查找零个或多个)。它对正则表达式的逻辑没有影响,但它解决了问题。感谢您的输入。

于 2013-11-28T13:15:48.937 回答