下面的代码有什么问题 - 我在注释中将其精确定位到连字符,但为什么会导致错误?
import re
valid = re.compile(r'''[^
\uFFFE\uFFFF # non-characters
]''', re.VERBOSE)
Traceback (most recent call last):
File "valid.py", line 5, in <module>
]''', re.VERBOSE)
File "/usr/local/lib/python3.3/re.py", line 214, in compile
return _compile(pattern, flags)
File "/usr/local/lib/python3.3/re.py", line 281, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/local/lib/python3.3/sre_compile.py", line 494, in compile
p = sre_parse.parse(p, flags)
File "/usr/local/lib/python3.3/sre_parse.py", line 748, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/local/lib/python3.3/sre_parse.py", line 360, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/local/lib/python3.3/sre_parse.py", line 506, in _parse
raise error("bad character range")
sre_constants.error: bad character range
没有连字符的下一段没有错误:
import re
valid = re.compile(r'''[^
\uFFFE\uFFFF # non characters !! no errors
]''', re.VERBOSE)
编辑:
除了@nhahtdh 的答案之外,字符串连接似乎是另一种以详细样式注释字符类的合理方法:
valid = re.compile( r'[^'
r'\u0000-\u0008' # C0 block first segment
r'\u000Bu\u000C' # allow TAB U+0009, LF U+000A, and CR U+000D
r'\u000E-\u001F' # rest of C0
r'\u007F' # disallow DEL U+007F
r'\u0080-\u009F' # All C1 block
r']' # don't forget this!
r'''
| [0-9] # normal verbose style
| [a-z] # another term +++
''', re.VERBOSE)