可选组的问题在于正则表达式引擎并没有真正寻找它们。它只检查它们是否存在于处理导致的当前位置。
使用([^#]+)
捕获标题将引擎置于正确的位置以匹配问题编号(如果存在)。如果您不想在标题末尾出现空格,请([^#]*[^#\s])\s*
改用。
import re
strings = ['Green Lantern #21',
'Green Lantern #21 (Variant Cover Edition)',
'Dejah Thoris & Green Men Of Mars #4 (of 8)',
'Dejah Thoris & Green Men Of Mars #4 (of 8) (Variant Cover Edition)',
'Macabre One Shot',
'Detective Comics #21 Combo Pack']
for s in strings:
print re.match(r'([^#]*[^#\s])\s*(?:#(\d+)\s*)?(?:\(of (\d+)\)\s*)?(.+)?', s).groups()
印刷
('Green Lantern', '21', None, None)
('Green Lantern', '21', None, '(Variant Cover Edition)')
('Dejah Thoris & Green Men Of Mars', '4', '8', None)
('Dejah Thoris & Green Men Of Mars', '4', '8', '(Variant Cover Edition)')
('Macabre One Shot', None, None, None)
('Detective Comics', '21', None, 'Combo Pack')