python - 如何检查两个连续单词是否具有相同的正则表达式模式

Question

我已经尝试了几个小时，但我无法达到我的目标。

这是字符串：'Hello world, By The Way stackoverflow is cool place'。我正在寻找的是匹配两个具有相同正则表达式模式的连续单词。

例如，我想用字符串替换以大写字母开头的连续单词"xx"。

所以当我将它应用到我的字符串时，结果应该是：

Hello world,xx xx xx stackoverflow is cool place

这是我的片段：

myString='Hello world,By The Way stackoverflow is cool place'
re.sub(r"[A-Z]\w+","xx",myString)

但我得到的是： 'xx world,xx xx xx stackoverflow is cool place'

score 1 · Accepted Answer

使用regex模块：

>>> import regex
>>> text = 'Hello world,By The Way stackoverflow is cool place'
>>> regex.sub(r'\b[A-Z]\w+(?=\s+[A-Z]\w+)|(?<=\b[A-Z]\w+\s+)[A-Z]\w+', 'xx', text)
'Hello world,xx xx xx stackoverflow is cool place'

score 0 · Accepted Answer

你可以这样做，有以下导入/分配

import re,string

lowercase = string.ascii_lowercase
uppercase = string.ascii_uppercase
punctuation = string.punctuation
digits = string.digits
specials = r'.^+*$\[]|()'

然后有一个函数创建由句子的单词/片段表示的模式

def getPat(text):
    pattern = r""
    for c in text:
        if c in uppercase:
            pattern += '[A-Z]'
        elif c in lowercase:
            pattern += '[a-z]'
        elif c in digits:
            pattern += '\d'
        else:
            if c in specials:
                pattern += '\%s' % c
            else:
                pattern += c
    return pattern

然后你可以检查单词并检查它们的模式是否匹配

sentance = 'Hello world, By Hi nI The Way stackoverflow is cool place'.split()
for word,wordNext in zip(sentance,sentance[1:]):
    if getPat(word) == getPat(wordNext):
        print("{0} = {1}".format(word,wordNext))

会产生

>>> 
By = Hi
The = Way

您可以通过调整循环来进行替换，如下所示：

res = ""
for word,wordNext in zip(sentance,sentance[1:]):
    if getPat(word) == getPat(wordNext):
        print("{0} = {1}".format(word,wordNext))
        res += " xx"*2
    else:
        res += " %s" % word
print(res)

会给：

 Hello world, xx xx Hi nI xx xx Way stackoverflow is cool

python - 如何检查两个连续单词是否具有相同的正则表达式模式

2 回答 2

Related

Reference