1

I'm using wrappers from Byte Comb (http://bytecomb.com/regular-expressions-in-vba/). They seem to be working very well. I need help formulating robust patterns.

I experience unexpected results when combining lookahead "(?=)" with or "|".

Input Text String           Pattern                 RxMatch
-----------------           -------                 -------
iraq                q(?!u)                  q
quit                q(?!u)                  0
iraq                q(?=u)                  0
quit                q(?=u)                  q
sta.23.5  .1 words 67.89  ch    \d+\.?\d*|\.\d+(?=\s*ch)            23.5
sta.23.5  .1 words 67.89  ch    (\d+\.?\d*)|(\.\d+)(?=\s*ch)        23.5
sta.23.5  .1 words 67.89  ch    \d+\.?\d*(?=\s*ch)              67.89
sta.23.5  .1 words 67.89  ch    \d+\.?\d*(?=\s*ch)|\.\d+(?=\s*ch)       67.89
sta.23.5  .1 words .89  ch      \d+\.?\d*|\.\d+(?=\s*ch)            23.5
sta.23.5  .1 words .89  ch      (\d+\.?\d*)|(\.\d+)(?=\s*ch)        23.5
sta.23.5  .1 words .89  ch      \d+\.?\d*(?=\s*ch)              89
sta.23.5  .1 words .89  ch      \d+\.?\d*(?=\s*ch)|\.\d+(?=\s*ch)       .89

"iraq" and "quit" work as expected. For the next set of input text strings, I hope to extract "67.89", and for the third, ".89". Initially, I formulated \d+.?\d*|.\d+ for floating decimal number to cover both situations. Adding parenthesis did not help. Removing the or helped for 67.89. Finally I found a working solution. But is there something better? Can you help me understand order of precedence? If possible, I'd like to keep the two parts of the or together.

Thanks, Not-a-programmer!

4

1 回答 1

0

\d+\.?\d*|\.\d+(?=\s*ch) 应用于“sta.23.5 .1 words 67.89 ch”首先捕获 23.5,因为它匹配\d+\.?\d*

或“|” 具有最高优先级,如果您想这样想,将模式分成两个可能的匹配项:\d+\.?\d*\.\d+(?=\s*ch)

如果您想阻止\d+\.?d*匹配 23.5,则必须添加一个额外的条件,例如在它之前需要一个空格字符并使用捕获括号将数字作为子匹配:\s(\d+\.?\d*)

您可以使用此模式匹配两者\s(\d+\.?\d*)|\.\d+(?=\s*ch),但请记住,如果前半部分匹配,您将查看实际值的子匹配项。

这里真正的问题是VBScript 的RegExp 类不支持lookbehind,只支持lookahead。

于 2013-11-08T00:17:45.473 回答