22

我需要关于我用 Python 制作的程序的帮助。

假设我想将单词的每个实例替换"steak""ghost"(随它去......),但我也想同时替换单词的每个"ghost"实例"steak"。以下代码不起作用:

 s="The scary ghost ordered an expensive steak"
 print s
 s=s.replace("steak","ghost")
 s=s.replace("ghost","steak")
 print s

它打印:The scary steak ordered an expensive steak

我想要得到的是The scary steak ordered an expensive ghost

4

6 回答 6

24

我可能会在这里使用正则表达式:

>>> import re
>>> s = "The scary ghost ordered an expensive steak"
>>> sub_dict = {'ghost':'steak','steak':'ghost'}
>>> regex = '|'.join(sub_dict)
>>> re.sub(regex, lambda m: sub_dict[m.group()], s)
'The scary steak ordered an expensive ghost'

或者,作为您可以复制/粘贴的功能:

import re
def word_replace(replace_dict,s):
    regex = '|'.join(replace_dict)
    return re.sub(regex, lambda m: replace_dict[m.group()], s)

基本上,我创建了一个我想用其他词 ( sub_dict) 替换的词的映射。我可以从该映射创建一个正则表达式。在这种情况下,正则表达式是"steak|ghost"(或"ghost|steak"-- 顺序无关紧要),而正则表达式引擎完成了查找非重叠序列并相应地替换它们的其余工作。


一些可能有用的修改

  • regex = '|'.join(map(re.escape,replace_dict))-- 允许正则表达式在其中具有特殊的正则表达式语法(如括号)。这会转义特殊字符以使正则表达式与文字文本匹配。
  • regex = '|'.join(r'\b{0}\b'.format(x) for x in replace_dict)-- 如果我们的一个词是另一个词的子字符串,请确保我们不匹配。换句话说,更改heshe但不更改thetshe
于 2013-03-10T16:08:35.817 回答
12

按其中一个目标拆分字符串,进行替换,然后将整个内容重新组合在一起。

pieces = s.split('steak')
s = 'ghost'.join(piece.replace('ghost', 'steak') for piece in pieces)

完全一样.replace(),包括忽略单词边界。所以会"steak ghosts"变成"ghost steaks".

于 2013-03-10T16:11:36.300 回答
4

将其中一个单词重命名为文本中未出现的临时值。请注意,对于非常大的文本,这不是最有效的方法。为此, are.sub可能更合适。

 s="The scary ghost ordered an expensive steak"
 print s
 s=s.replace("steak","temp")
 s=s.replace("ghost","steak")
 S=s.replace("temp","steak")
 print s
于 2013-03-10T15:59:40.587 回答
2

在方法中使用 count 变量string.replace()。因此,使用您的代码,您将拥有:

s="The scary ghost ordered an expensive steak"
print s
s=s.replace("steak","ghost", 1)
s=s.replace("ghost","steak", 1)
print s

http://docs.python.org/2/library/stdtypes.html

于 2013-03-10T16:01:47.047 回答
1

注意考虑到这个问题的收视率,我为不同类型的测试用例取消删除并重写了它

我从答案中考虑了四种相互竞争的实现

>>> def sub_noregex(hay):
    """
    The Join and replace routine which outpeforms the regex implementation. This
    version uses generator expression
    """
    return 'steak'.join(e.replace('steak','ghost') for e in hay.split('ghost'))

>>> def sub_regex(hay):
    """
    This is a straight forward regex implementation as suggested by @mgilson
    Note, so that the overheads doesn't add to the cummulative sum, I have placed
    the regex creation routine outside the function
    """
    return re.sub(regex,lambda m:sub_dict[m.group()],hay)

>>> def sub_temp(hay, _uuid = str(uuid4())):
    """
    Similar to Mark Tolonen's implementation but rather used uuid for the temporary string
    value to reduce collission
    """
    hay = hay.replace("steak",_uuid).replace("ghost","steak").replace(_uuid,"steak")
    return hay

>>> def sub_noregex_LC(hay):
    """
    The Join and replace routine which outpeforms the regex implementation. This
    version uses List Comprehension
    """
    return 'steak'.join([e.replace('steak','ghost') for e in hay.split('ghost')])

广义的 timeit 函数

>>> def compare(n, hay):
    foo = {"sub_regex": "re",
           "sub_noregex":"",
           "sub_noregex_LC":"",
           "sub_temp":"",
           }
    stmt = "{}(hay)"
    setup = "from __main__ import hay,"
    for k, v in foo.items():
        t = Timer(stmt = stmt.format(k), setup = setup+ ','.join([k, v] if v else [k]))
        yield t.timeit(n)

和通用测试程序

>>> def test(*args, **kwargs):
    n = kwargs['repeat']
    print "{:50}{:^15}{:^15}{:^15}{:^15}".format("Test Case", "sub_temp",
                             "sub_noregex ", "sub_regex",
                             "sub_noregex_LC ")
    for hay in args:
        hay, hay_str = hay
        print "{:50}{:15.10}{:15.10}{:15.10}{:15.10}".format(hay_str, *compare(n, hay))

测试结果如下

>>> test((' '.join(['steak', 'ghost']*1000), "Multiple repeatation of search key"),
         ('garbage '*998 + 'steak ghost', "Single repeatation of search key at the end"),
         ('steak ' + 'garbage '*998 + 'ghost', "Single repeatation of at either end"),
         ("The scary ghost ordered an expensive steak", "Single repeatation for smaller string"),
         repeat = 100000)
Test Case                                            sub_temp     sub_noregex      sub_regex   sub_noregex_LC 
Multiple repeatation of search key                   0.2022748797   0.3517142003   0.4518992298   0.1812594258
Single repeatation of search key at the end          0.2026047957   0.3508259952   0.4399926194   0.1915298898
Single repeatation of at either end                  0.1877455356   0.3561734007   0.4228843986   0.2164233388
Single repeatation for smaller string                0.2061019057   0.3145984487   0.4252060592   0.1989413449
>>> 

根据测试结果

  1. Non Regex LC 和 temp 变量替换具有更好的性能,但 temp 变量的使用性能并不一致

  2. LC 版本比发电机有更好的性能(已确认)

  3. 正则表达式的速度要慢两倍以上(因此,如果这段代码是瓶颈,则可以重新考虑实现更改)

  4. 正则表达式和非正则表达式版本同样健壮并且可以扩展

于 2013-03-10T17:07:26.820 回答
1

这样的事情怎么样?将原件存储在拆分列表中,然后有一个翻译字典。保持核心代码简短,然后在需要调整翻译时调整字典。另外,易于移植到功能:

 def translate_line(s, translation_dict):
    line = []
    for i in s.split():
       # To take account for punctuation, strip all non-alnum from the
       # word before looking up the translation.
       i = ''.join(ch for ch in i if ch.isalnum()]
       line.append(translation_dict.get(i, i))
    return ' '.join(line)


 >>> translate_line("The scary ghost ordered an expensive steak", {'steak': 'ghost', 'ghost': 'steak'})
 'The scary steak ordered an expensive ghost'
于 2013-03-10T16:07:02.337 回答