python - 正则表达式在正则表达式上工作，但不适用于 Python

Question

我正在尝试编写一个正则表达式来捕获电子邮件 ID。使用 regexpal.com 进行了几个小时的测试。在该站点上，它能够捕获所有电子邮件 ID。当我在 Python 中替换相同的正则表达式并执行 re.findall(pattern,line) 时，它无法捕捉到它。

正则表达式：

[a-zA-Z0-9-_]+[(.)?a-zA-Z0-9-_]*\s*(@|at)\s*[a-zA-Z0-9-_]+\s*(.|dot)\s*[a-zA-Z0-9-_]*\s*(.|dot)\s*e(\-)?d(\-)?u(\-)?(.,)?

例子：

Line =    <TR> <TD><B>E-Mail: </B> <TD><A HREF=MailTo:*example.young@stackoverflow.edu*\>*example.young@stackoverflow.edu*</A>

（在 regexpal.com 上正确突出显示）。

使用 Python：

 for line in f:
    print 'Line = ',line
        matches = re.findall(my_first_pat,line)
    print 'Matches = ',matches

给出输出：

Line =    <TR> <TD><B>E-Mail: </B> <TD><A HREF=MailTo:example.young@stackoverflow.edu>example.young@stackoverflow.edu</A>

Matches =  [('@', 'd', '.', '', '', '', ''), ('@', 'd', '.', '', '', '', '')]

问题是什么？

score 1 · Accepted Answer

阅读以下文档re.findall：

如果模式中存在一个或多个组，则返回组列表

您的组仅捕获 at 符号、点等，因此这就是 re.findall 返回的全部内容。要么使用非捕获组，将整个事物包装在一个组中，要么使用re.finditer.

（正如@Igor Chubin 所指出的，您的正则表达式也错误地使用.了代替\.，但这并不是主要问题。）

score 0 · Accepted Answer

你\.不能.在这里使用：

(.|dot)

如果您只想说edu零件中的字母之间可以有连字符，则可以在不使用斜杠和分组的情况下执行此操作：

e-?d-?u-?[.,]?

如果您()仅用于对符号进行分组（但不用于捕获），则必须(?:)改用：

(?:@|at)

python - 正则表达式在正则表达式上工作，但不适用于 Python

2 回答 2

Related

Reference