首先,这是您在自由间距模式下的正则表达式:
tidied = re.compile(r"""
( # $1: ...
( # $2: One ... from 3 alternatives.
13th # Either a1of3.
| ( # Or a2of3 $3: One ... from 2 alternatives.
Executive[ ] # Either a1of2.
| Residential # Or a2of2.
) # End $3: One ... from 2 alternatives.
| ( # Or a3of3 $4: Last match from 1 to 3 ...
(\w+) # $5: ...
[ ] #
){1,3} # End $4: Last match from 1 to 3 ...
) # End $2: One ... from 3 alternatives.
Floor #
) # End $1: ...
""", re.VERBOSE)
请注意,上述模式具有无效的额外括号。这是一个简化的表达式,它在功能上是等效的:
tidied = re.compile(r"""
( # $1: One ... from 4 alternatives.
13th # Either a1of4.
| Executive[ ] # Or a2of4.
| Residential # Or a3of4.
| ( # Or a4of4 $2: Last match from 1 to 3 ...
(\w+) # $3: ...
[ ] #
){1,3} # End $2: Last match from 1 to 3 ...
) # End $1: One ... from 4 alternatives.
Floor #
""", re.VERBOSE)
最长的最左边匹配
在所需单词之前有四个有效的分组替代方案:Floor
. 前三个选项都只有一个单词,但第四个选项匹配三个单词。NFA 正则表达式引擎从左到右工作并且总是试图找到最长的最左边的匹配。在这种情况下,当正则表达式一次遍历一个字符时,它会在每个字符位置测试所有四个选项。由于第四个选项总是可以在其他三个之前匹配两个单词,所以它总是首先匹配(假设Floor
在给定文本中前面有三个单词。)。如果前面没有三个单词Floor
,则前三个替代项之一可以匹配。
13th
另请注意, and替代项后面不需要空格Residential
,因此它只会在呈现的文本具有连接文本时匹配:ResidentialFloor
或13thFloor
。