python - 无法从 html 页面读取行

Question

我正在尝试从特定站点削减时间格式。正则表达式正在工作（尝试使用正则表达式测试器并工作），但是当我尝试在 Python 中运行代码时，我得到：

import urllib,re

sock = urllib.urlopen("http://www.wolframalpha.com/input/?i=time")
htmlSource = sock.read()
sock.close()
ips = re.findall( r'([01]?[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}',htmlSource)
print ips

结果：

>>>
['7', '4']
>>>

regextester.com上的时间标记为红色我想以以下格式提取时间：xx:xx:xx (24h)。

为什么会这样？谢谢！

score 1 · Accepted Answer

您的正则表达式中有一些冗余量词（那些{1}）。您可以删除它们。

另一件事是re.findall只返回您的捕获，即小时数。将第一个捕获更改为非捕获组(?: ... )并捕获整个正则表达式：

((?:[01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9])

我认为应该这样做。

python - 无法从 html 页面读取行

1 回答 1

Related

Reference