python - REGEX - 匹配列表项后跟另一个列表项'n'次

Question

假设我们有一个列表

search_list = [one, two, three, four, five, six]

并且我们想要匹配此列表中的任何项目，该项目在以下字符串中紧随另一个项目 n 次

example string = This string has one two three and also five six in it

我们将如何构建一个可以找到所有彼此相邻的项目的正则表达式？

在这种情况下，使用 re.findall 搜索，输出应该是

[('one', 'two', 'three'), ('five', 'six')]

这是我到目前为止尝试过的

将列表转换为可搜索字符串：

chain_regex = [re.escape(i) for i in search_list]
chain_regex = '|'.join(chain_re)
re.findall(f'({chain_regex})\s*({chain_regex})', example_string)

这工作正常并产生以下输出：

[('one', 'two'), ('five', 'six')]

假设我想这样做 n 次。您将如何重组此查询，以便可以重复它而无需无限期地链接它，如下所示：

re.findall(f'({chain_regex})\s*({chain_regex})\s*({chain_regex})*\s*({chain_regex})*', example_string) etc....

编辑

re.findall(f'({chain_re})(\s*({chain_re}))+', example_string)

产生以下不太正确的输出。

[('one', ' three', 'three'), ('five', ' six', 'six')]

将越来越多的项目链接在一起确实有效，但我不能总是确定我需要将它链接多少次 - 这就是我卡住的地方

score 1 · Accepted Answer

您可以使用简单的正则表达式来做到这一点，但您必须过滤结果：

import re

test1 = "This string has one two three and also five six in it"
reg = re.compile(r"(((one|two|three|four|five|six).?)*)")
match = re.findall(reg, test)
filtered = [m[0] for m in match if len(m[0].split(" ")) > 1]
filtered = [list(filter(None, f.split(' '))) for f in filtered]
filtered #[['one', 'two', 'three'], ['five', 'six']]

示例：（更新）https://regex101.com/r/YhlhRQ/4

python - REGEX - 匹配列表项后跟另一个列表项'n'次

1 回答 1

Related

Reference