python - Python正则表达式匹配多次

Question

我正在尝试将模式与可能具有多个模式实例的字符串进行匹配。我需要单独的每个实例。re.findall() 应该这样做，但我不知道我做错了什么。

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')

我需要' http://url.com/123 '、http ://url.com/456和两个数字 123 和 456 是match列表的不同元素。

我也尝试'/review: ((http://url.com/(\d+)\s?)+)/'过这种模式，但没有运气。

score 24 · Accepted Answer

用这个。您需要将“审查”放在捕获组之外以达到所需的结果。

pattern = re.compile(r'(?:review: )?(http://url.com/(\d+))\s?', re.IGNORECASE)

这给出了输出

>>> match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
>>> match
[('http://url.com/123', '123'), ('http://url.com/456', '456')]

score 6 · Accepted Answer

你在正则表达式中有额外的 /。在 python 中，模式应该只是一个字符串。例如，而不是这个：

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)

它应该是：

pattern = re.compile('review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

通常在 python 中，你实际上会使用这样的“原始”字符串：

pattern = re.compile(r'review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

字符串前面的额外 r 使您不必进行大量反斜杠转义等操作。

score 2 · Accepted Answer

使用两步法：首先获取从“review:”到 EOL 的所有内容，然后对其进行标记。

msg = 'this is the message. review: http://url.com/123 http://url.com/456'

review_pattern = re.compile('.*review: (.*)$')
urls = review_pattern.findall(msg)[0]

url_pattern = re.compile("(http://url.com/(\d+))")
url_pattern.findall(urls)

python - Python正则表达式匹配多次

3 回答 3

Related

Reference