python - 用于匹配字符串中任意位置 XYZ 之后出现的任何 ABC 的正则表达式

Question

我正在尝试编写一个正则表达式，该表达式将匹配字符串中任何位置的 XYZ 之后出现的任何 ABC：

前任。text - “一些 ABC 文本后跟 XYZ 后跟多个 ABC，更多 ABC，更多 ABC”

即，正则表达式应该匹配 XYZ 之后的三个 ABC。

有什么线索吗？

score 1 · Accepted Answer

XYZ只需匹配重复的文字和组ABC：

r'XYZ((?:ABC)+)'

该(?:ABC)+模式至少匹配一组文字字符，并且整个组前面都有文字 XYZ。

这是非常基本的正则表达式 101，你应该阅读一个很好的正则表达式匹配教程来开始。

score 1 · Accepted Answer

像这样的东西？r"(?<=XYZ)((?:ABC)+)". 这将只匹配ABCwhen they follow的出现XYZ，但不会包括XYZ其自身。

编辑

看起来我误解了 OP 的原始问题。最简单的方法是首先找到字符串XYZ。保存的起始位置XYZ。使用起始位置作为p.finditer(string, startpos). 请注意，这仅适用于已编译的正则表达式，因此您需要先编译您的模式。

您需要的模式很简单r"(ABC)"。

或者，您可以使用p.sub()，它也将进行替换，但要仅对字符串的一部分起作用，您需要先创建一个子字符串。p.sub()没有startpos参数。

score 1 · Accepted Answer

您可以采用迭代方法：

s = "Some ABC text followed by XYZ followed by multiple ABC, more ABC, more ABC"

pattern = re.compile(r'(?<=XYZ)(.*?)ABC')
while pattern.search(s):
   s = pattern.sub(r'\1REPLACED', s)

print s

输出：

一些 ABC 文本后跟 XYZ 后跟多个 REPLACED，更多 REPLACED，更多 REPLACED

score 0 · Accepted Answer

集合中有一个漂亮的 Counter 对象可能会有所帮助。Counter 对象是一个字典，键是单个项目，值是计数。例子：

Counter('hello there hello'.split()) # {'hello':2, 'there', 1}

因为我们要计算单词，所以我们必须在看到空格的地方拆分短语。这是 split 方法的默认行为。这是一个使用计数器的示例脚本。如果需要，下半部分可以改编成函数。

from collections import Counter

def count_frequency(phrase):
    """ Return a dictionary with {word: num_of_occurences} """
    counts = Counter(phrase.split())
    return counts

def replace_word(target_word, replacement, phrase):
    """ Replaces *word* with *replacement* in string *phrase* """
    phrase = phrase.split()

    for count, word in enumerate(phrase):
        if word == target_word:
            phrase[count] = replacement

    return ''.join(phrase)

phrase = "hello there hello hello"
word_counts = count_frequency(phrase)
new_phrase = ''
replacement = 'replaced'

for word in word_counts:
    if word_counts[word] > 2:
        phrase = phrase.replace(word, replacement)

print(phrase)

python - 用于匹配字符串中任意位置 XYZ 之后出现的任何 ABC 的正则表达式

4 回答 4

Related

Reference