2

I need to extract people's names from a really long string.

Their names are in this format: LAST, FIRST.

Some of these people have hyphenated names. Some don't.

My attempt with a smaller string:

Input:

import re
text = 'Smith-Jones, Robert&Epson, Robert'
pattern = r'[A-Za-z]+(-[A-Za-z]+)?,\sRobert'
print re.findall(pattern, text)

Expected output:

['Smith-Jones, Robert', 'Epson, Robert']

Actual output:

['-Jones', '']

What am I doing wrong?

4

1 回答 1

2

利用

import re
text = 'Smith-Jones, Robert&Epson, Robert'
pattern = r'[A-Za-z]+(?:-[A-Za-z]+)?,\sRobert'
print re.findall(pattern, text)
# => ['Smith-Jones, Robert', 'Epson, Robert']

只需使捕获组不捕获即可。问题是,findall如果它们在正则表达式模式中指定,则返回捕获组值。所以,在这种模式下解决这个问题的最好方法就是(...)?(?:...)?.

IDEONE 演示

于 2015-09-25T14:47:48.823 回答