好吧,一旦你放弃解析嵌套表达式应该在无限深度下工作的想法,你可以通过提前指定最大深度来很好地使用正则表达式。方法如下:
def nested_matcher (n):
# poor man's matched paren scanning, gives up after n+1 levels.
# Matches any string with balanced parens or brackets inside; add
# the outer parens yourself if needed. Nongreedy. Does not
# distinguish parens and brackets as that would cause the
# expression to grow exponentially rather than linearly in size.
return "[^][()]*?(?:[([]"*n+"[^][()]*?"+"[])][^][()]*?)*?"*n
import re
p = re.compile('[^][()]+|[([]' + nested_matcher(10) + '[])]')
print p.findall('a(b[c]d)e')
print p.findall('a[b[c]d]e')
print p.findall('[hello [world]] abc [123] [xyz jkl]')
这将输出
['a', '(b[c]d)', 'e']
['a', '[b[c]d]', 'e']
['[hello [world]]', ' abc ', '[123]', ' ', '[xyz jkl]']