python - 如何使用 re(gex) 在 Python 中的文本中找到类似 252.63.71.62 的模式？

Question

我有一个网页，我使用 Python 中的资源模块从中获取其文本。但是，我不明白，如何从文档中获取像 126.23.73.34 这样的数字模式并使用 re 模块将其提取出来？

score 3 · Accepted Answer

您可以将正则表达式用于 IPd{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

text = "126.23.73.34";
match = re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', text)
if match:
   print "match.group(1) : ", match.group(0)

如果您正在寻找一个完整的正则表达式来获取 IPv4 地址，您可以在此处找到最合适的正则表达式。

要将 IP 地址中的所有 4 个数字限制为 0-255，您可以使用从上面的源中获取的这个：

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

score 1 · Accepted Answer

If if it is an html text; you could use an html parser (such as BeautifulSoup) to parse it, a regex to select some strings that look like an ip, and socket module to validate ips:

import re
import socket
from bs4 import BeautifulSoup # pip install beautifulsoup4

def isvalid(addr):
    try:
        socket.inet_aton(addr)
    except socket.error:
        return False
    else:
        return True

soup = BeautifulSoup(webpage)
ipre = re.compile(r"\b\d+(?:\.\d+){3}\b") # matches some ips and more
ip_addresses = [ip for ips in map(ipre.findall, soup(text=ipre))
                for ip in ips if isvalid(ip)]

Note: it extracts ips only from text e.g., it ignores ips in html attributes.

score 0 · Accepted Answer

你可以使用这个。它只接受有效的IP 地址：

import re
pattern = "\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b"
text = "192.168.0.1 my other IP is 192.168.0.254 but this one isn't a real ip 555.555.555.555"
m = re.findall(pattern, text)
for i in m :
    print(i)

输出：

C:\wamp\www>Example.py
192.168.0.1
192.168.0.254

--经过测试和工作

python - 如何使用 re(gex) 在 Python 中的文本中找到类似 252.63.71.62 的模式？

3 回答 3

Related

Reference