python - 搜索并替换为“仅全词”选项

Question

我有一个脚本可以运行到我的文本中，并搜索和替换我在数据库中编写的所有句子。

剧本：

with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
    for l in f:
        s = l.split('*')
        editor.replace(s[0],s[1])

和数据库示例：

Event*Evento*
result*resultado*

等等...

现在发生的事情是我需要该脚本中的“仅整个单词”，因为我发现自己遇到了问题。

例如与Resultand Event，因为当我替换ResultadoandEvento时，我在文本中再次运行脚本一次，脚本再次替换Resultadoand Evento。

我运行脚本后的结果保持这样Resultadoado和Eventoo.

只是让你们知道.. 它不仅适用于事件和结果，我已经设置了超过 1000 多个句子用于搜索和替换工作..

我不需要简单的搜索和替换两个词..因为我将一遍又一遍地编辑数据库以获取不同的句子..

score 25 · Accepted Answer

你想要一个正则表达式。您可以使用标记\b来匹配单词边界：即，\bresult\b仅匹配确切的单词“result”。

import re

with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
    for l in f:
        s = l.split('*')
        editor = re.sub(r"\b%s\b" % s[0] , s[1], editor)

score 15 · Accepted Answer

使用re.sub：

replacements = {'the':'a', 
                'this':'that'}

def replace(match):
    return replacements[match.group(0)]

# notice that the 'this' in 'thistle' is not matched 
print re.sub('|'.join(r'\b%s\b' % re.escape(s) for s in replacements), 
        replace, 'the cat has this thistle.')

印刷

a cat has that thistle.

笔记：

所有要替换的字符串都连接成一个模式，因此字符串只需循环一次。
传递源字符串re.escape以避免将它们解释为正则表达式。
单词被包围r'\b'以确保匹配仅适用于整个单词。
使用替换功能，以便可以替换任何匹配项。

score 12 · Accepted Answer

使用re.sub代替普通字符串替换来仅替换整个单词。因此，即使它再次运行，您的脚本也不会替换已经替换的单词。

>>> import re
>>> editor = "This is result of the match"
>>> new_editor = re.sub(r"\bresult\b","resultado",editor)
>>> new_editor
'This is resultado of the match'
>>> newest_editor = re.sub(r"\bresult\b","resultado",new_editor)
>>> newest_editor
'This is resultado of the match'

score 6 · Accepted Answer

这很简单。使用 re.sub，不要使用替换。

import re
replacements = {r'\bthe\b':'a', 
                r'\bthis\b':'that'}

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = re.sub(i,j,text)
    return text

replace_all("the cat has this thistle.", replacements)

它会打印

a cat has that thistle.

score 1 · Accepted Answer

import re

match = {}  # create a dictionary of words-to-replace and words-to-replace-with

f = open("filename", "r")
data = f.read()  # string of all file content


def replace_all(text, dic):
    for i, j in dic.items():
        text = re.sub(r"\b%s\b" % i, j, text)
        # r"\b%s\b"% enables replacing by whole word matches only
    return text


data = replace_all(data, match)
print(data)  # you can copy and paste the result to whatever file you like

python - 搜索并替换为“仅全词”选项

5 回答 5

Related

Reference