python - 如何使用正则表达式来引用特定部分？

Question

我有一个 Python 字符串，其中包含我想使用正则表达式提取的信息。

例子：

"The weather is 75 degrees with a humidity of 13%"

我只想拔出“75”和“13”。这是我迄今为止在 Python 中尝试过的。

import re

str = "The weather is 75 degrees with a humidity of 13%"
m = re.search("The weather is \d+ degrees with a humidity of \d+%", str)
matched = m.group()

但是，这显然匹配整个字符串，而不仅仅是我想要的部分。如何只提取我想要的数字？我研究了反向引用，但它似乎只适用于正则表达式模式本身。

score 2 · Accepted Answer

m = re.search("The weather is (\d+) degrees with a humidity of (\d+)%", str)
matched = m.groups()

你需要把你想要的东西用括号括起来......

>>> s1 = "The weather is 75 degrees with a humidity of 13%"
>>> m = re.search("The weather is (\d+) degrees with a humidity of (\d+)%", s1)
>>> m.groups()
('75', '13')

或者只是用来findall从任何字符串中获取数字

>>> re.findall("\d+",s1)
['75', '13']

score 2 · Accepted Answer

也许您想使用命名组？

>>> m = re.search("The weather is (?P<temp>\d+) degrees with a humidity of (?P<humidity>\d+)%", s1)
>>> m.group('temp')
'75'
>>> m.group('humidity')
'13'

score 0 · Accepted Answer

当您想从文本中提取类型化数据（例如数字）时，parse这是一个非常有用的库。在许多方面，它与字符串格式相反。它需要一个模式，并且会进行类型转换。

在最简单的情况下，它可以让您避免担心正则表达式组等等。

>>> s = "The weather is 75 degrees with a humidity of 13%"
>>> parse("The weather is {} degrees with a humidity of {}%", s)
<Result ('75', '13') {}>

该Result对象很容易使用：

>>> r = _
>>> r[0]
'75'

我们可以通过指定字段名称和/或类型转换来做得更好。这是我们需要做的所有事情，以使结果为整数：

>>> parse("The weather is {:d} degrees with a humidity of {:d}%", s)
<Result (75, 13) {}>

如果我们想使用非索引键，那么添加字段名称：

>>> parse("The weather is {temp:d} degrees with a humidity of {humidity:d}%", s)
<Result () {'temp': 75, 'humidity': 13}>
>>> r = _
>>> r['temp']
75

python - 如何使用正则表达式来引用特定部分？

3 回答 3

Related

Reference