5

如果我写

(?<=\()\w+(?=\))

对于这个字符串: (Test) (Test2) (Test3)

我会得到: 测试 Test2 Test3

那讲得通。

如果我写

\w+ (?<=\()\w+(?=\))

对于此字符串:LTE(测试)

它什么也没返回..这里有什么问题?

请清楚地解释您的正则表达式,因为它可能难以阅读。

4

2 回答 2

5

Lookarounds do not consume characters!

Here's a step by step way to see it (might not be the best, but that's how I interpret it anyway):

First character is L, the regex engine compares it with \w+ and agrees that it's a match. Same happens for T, then E.

At the space, the regex engine sees a space in the regular expression, that's fine as well.

Next up is the opening paren, but what does the regex see? Remember that lookarounds do not consume characters, so that the \( in (?<=\() is not actually being consumed and \( does not match what \w+ matches!

You might think about the regex actually consuming those characters: \w+ \w+, but with a condition on the second \w+, that it must be found between parens. The condition might be satisfied, but the expression itself does not match any parentheses!

To make it match, you should add the parens:

\w+ \((?<=\()\w+(?=\))\)

After seeing and matching the space, the regex engine sees (, which agrees with the provided expression, it moves forward.

The engine then sees T. First, does it match the next character, \w+? Yes, second, is there an opening paren before it? Yes.

Before moving forward, it sees a positive lookahead. Is there a closing paren just ahead? No, there's e, but \w+ can still be satisfied, so it matches e with another \w. This goes on like this until t. Is there a closing paren after t? Yes, thus proceed to next check.

It encounters a closing paren, which is matched by the closing paren in the expression (note that the literal closing paren could be dropped here, and you will be matching LTE (Test instead).

But with all this, it might be just as good to have dropped the lookarounds:

\w+ \(\w+\)

Because they add more strain on the engine and even though it's not that visible on small scale, it can be significant on a larger string.

Hopefully, it helps, even if it's a little bit!

于 2013-08-14T17:15:26.757 回答
2

Lookahead 和lookbehind 是“零宽度断言”,它们不消耗字符串中的字符,而仅断言匹配是否可能。您的第二个模式试图找到一个<word1><space><word2>结构,但它期望它<word2>被括号包围。它不会匹配任何东西,因为它之前接受的唯一字符<word2><space>! 我会简单地将括号直接写入模式:(\w+) \((\w+)\)。我试过了,它给了我LTETest

于 2013-08-14T17:19:33.183 回答