python - 使用正则表达式在网页中查找电子邮件地址

Question

我是 Python 的初学者。这是我必须从网页中查找电子邮件地址实例的代码。

    page = urllib.request.urlopen("http://website/category")
    reg_ex = re.compile(r'[-a-z0-9._]+@([-a-z0-9]+)(\.[-a-z0-9]+)+', re.IGNORECASE
    m = reg_ex.search_all(page)
    m.group()

当我运行它时，Python 模块说语法无效，它就行了：

    m = reg_ex.search_all(page)

有人能告诉我为什么它无效吗？

score 6 · Accepted Answer

考虑一个替代方案：

## Suppose we have a text with many email addresses
str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) 
    ## ['alice@google.com', 'bob@abc.com']    
for email in emails:
    # do something with each found email string
    print email

来源：https ://developers.google.com/edu/python/regular-expressions

score 2 · Accepted Answer

您没有)在此行关闭：

reg_ex = re.compile(r'[a-z0-9._]+@([-a-z0-9]+)(\.[-a-z0-9]+)+', re.IGNORECASE)

另外，您的正则表达式无效，请尝试以下操作：

"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"

仅供参考，使用正则表达式验证电子邮件并不是那么简单，请参阅这些线程：

score 2 · Accepted Answer

2

此外，reg_ex没有search_all办法。你应该通过page.read()。

于 2013-08-08T07:19:18.650 回答

score 1 · Accepted Answer

re模块没有.search_all方法

也许你正在寻找的人是.findall

你可以试试

re.findall(r"(\w(?:[-.+]?\w+)+\@(?:[a-zA-Z0-9](?:[-+]?\w+)*\.)+[a-zA-Z]{2,})", text)

我假设text是要搜索的文本，在你的情况下应该是text = page.read()

或者您需要编译正则表达式：

r = re.compile(r"(\w(?:[-.+]?\w+)+\@(?:[a-z0-9](?:[-+]?\w+)*\.)+[a-z]{2,})", re.I)
results = r.findall(text)

注意： .findall返回匹配列表

如果您需要迭代以获取匹配对象，您可以使用.finditer

（来自之前的示例）

r = re.compile(r"(\w(?:[-.+]?\w+)+\@(?:[a-z0-9](?:[-+]?\w+)*\.)+[a-z]{2,})", re.I)
for email_match in r.finditer(text):
    email_addr = email_match.group() #or anything you need for a matched object

现在问题是你必须使用什么正则表达式:)

score 0 · Accepted Answer

更改r'[-a-z0-9._]+@([-a-z0-9]+)(\.[-a-z0-9]+)+'为r'[aA-zZ0-9._]+@([aA-zZ0-9]+)(\.[aA-zZ0-9]+)+'。az之前的-字符是原因

python - 使用正则表达式在网页中查找电子邮件地址

5 回答 5

Related

Reference