我正在尝试在 Python (2.7) 中使用详细的正则表达式。如果这很重要,我只是想让它更容易回去,并在未来的某个时候更清楚地理解这个表达。因为我是新手,所以我首先创建了一个紧凑的表达式,以确保我得到了我想要的。
test_verbose_item_pattern = re.compile('\n{1}\b?I[tT][eE][mM]\s+\d{1,2}\.?\(?[a-e]?\)?.*[^0-9]\n{1}')
verbose_item_pattern = re.compile("""
\n{1} #begin with a new line allow only one new line character
\b? #allow for a word boundary the ? allows 0 or 1 word boundaries \nITEM or \n ITEM
I # the first word on the line must begin with a capital I
[tT][eE][mM] #then we need one character from each of the three sets this allows for unknown case
\s+ # one or more white spaces this does allow for another \n not sure if I should change it
\d{1,2} # require one or two digits
\.? # there could be 0 or 1 periods after the digits 1. or 1
\(? # there might be 0 or 1 instance of an open paren
[a-e]? # there could be 0 or 1 instance of a letter in the range a-e
\)? # there could be 0 or 1 instance of a closing paren
.* #any number of unknown characters so we can have words and punctuation
[^0-9] # by its placement I am hoping that I am stating that I do not want to allow strings that end with a number and then \n
\n{1} #I want to cut it off at the next newline character
Traceback (most recent call last):
File "C:/Users/Dropbox/directEDGAR-Code-Examples/NewItemIdentifier.py", line 17, in <module>
File "C:\Python27\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
error: nothing to repeat