python - Python中的正则表达式 - 奇怪的行为

Question

我想解析uptimeUnix 命令的输出。这是两个不同的样本：

14:25  up 1 day,  1:24, 2 users, load averages: 0,56 0,48 0,47
14:25  up 1:24, 2 users, load averages: 0,56 0,48 0,47

（我使用的语言是 Python）

所以，假设上面的两个样本被保存到变量s1和s2.

这是我写的代码：

>>> RE = r'''
    ((\d) \s day)?        # this should match "n day" if it's there 
    .*?                   # this should match everything until the next regex
    \s(\d{1,2}):(\d{1,2}) # this should match a space followed by "hh:mm"
'''

>>> print re.match(RE, s1, re.VERBOSE).groups()
(None, None, '1', '24')
>>> print re.match(RE, s2, re.VERBOSE).groups()
(None, None, '1', '24')

正则表达式的第二部分，即抓住正常运行时间的几小时-分钟的部分，完美运行。但是为什么元组的第一部分总是None？我错过了什么？这是一个贪婪与非贪婪的问题吗？

score 3 · Accepted Answer

您想将移动.*? 到可选的日期组并使用.search()：

RE = r'''
    (?:(\d) \s day.*?)?   # this should match "n day" if it's there
    \s(\d{1,2}):(\d{1,2}) # this should match a space followed by "hh:mm"
'''

演示：

>>> RE = r'''
...     (?:(\d) \s day.*?)?        # this should match "n day" if it's there
...     \s(\d{1,2}):(\d{1,2}) # this should match a space followed by "hh:mm"
... '''
>>> print re.search(RE, s1, re.VERBOSE).groups()
('1', '1', '24')
>>> print re.search(RE, s2, re.VERBOSE).groups()
(None, '1', '24')

模式锚定在上:，然后回溯。然后.*?匹配拍号之前的整个文本，满足模式。

通过将.*?部分移动到可选day组中（在我的版本中不捕获），您可以保证它不会回溯到day文字文本之外。

score 0 · Accepted Answer

而不是从命令行读取另一种方法是直接从/proc/uptime

#!/usr/bin/python

from datetime import timedelta

with open('/proc/uptime', 'r') as f:
    uptime_seconds = float(f.readline().split()[0])
    uptime_string = str(timedelta(seconds = uptime_seconds))

print(uptime_string)

输出：

35 days, 23:06:35.530000

现在使用标记化或拆分内置函数更容易解析

score 0 · Accepted Answer

匹配从字符串的开头开始，添加 .*? 一开始

In [37]: RE=r'.*?((\d) \s day) .*?  \s(\d{1,2}):(\d{1,2})'

In [38]: print re.match(RE, s1, re.VERBOSE).groups()
('1 day', '1', '1', '24')

python - Python中的正则表达式 - 奇怪的行为

3 回答 3

Related

Reference