注意考虑到这个问题的收视率,我为不同类型的测试用例取消删除并重写了它
我从答案中考虑了四种相互竞争的实现
>>> def sub_noregex(hay):
"""
The Join and replace routine which outpeforms the regex implementation. This
version uses generator expression
"""
return 'steak'.join(e.replace('steak','ghost') for e in hay.split('ghost'))
>>> def sub_regex(hay):
"""
This is a straight forward regex implementation as suggested by @mgilson
Note, so that the overheads doesn't add to the cummulative sum, I have placed
the regex creation routine outside the function
"""
return re.sub(regex,lambda m:sub_dict[m.group()],hay)
>>> def sub_temp(hay, _uuid = str(uuid4())):
"""
Similar to Mark Tolonen's implementation but rather used uuid for the temporary string
value to reduce collission
"""
hay = hay.replace("steak",_uuid).replace("ghost","steak").replace(_uuid,"steak")
return hay
>>> def sub_noregex_LC(hay):
"""
The Join and replace routine which outpeforms the regex implementation. This
version uses List Comprehension
"""
return 'steak'.join([e.replace('steak','ghost') for e in hay.split('ghost')])
广义的 timeit 函数
>>> def compare(n, hay):
foo = {"sub_regex": "re",
"sub_noregex":"",
"sub_noregex_LC":"",
"sub_temp":"",
}
stmt = "{}(hay)"
setup = "from __main__ import hay,"
for k, v in foo.items():
t = Timer(stmt = stmt.format(k), setup = setup+ ','.join([k, v] if v else [k]))
yield t.timeit(n)
和通用测试程序
>>> def test(*args, **kwargs):
n = kwargs['repeat']
print "{:50}{:^15}{:^15}{:^15}{:^15}".format("Test Case", "sub_temp",
"sub_noregex ", "sub_regex",
"sub_noregex_LC ")
for hay in args:
hay, hay_str = hay
print "{:50}{:15.10}{:15.10}{:15.10}{:15.10}".format(hay_str, *compare(n, hay))
测试结果如下
>>> test((' '.join(['steak', 'ghost']*1000), "Multiple repeatation of search key"),
('garbage '*998 + 'steak ghost', "Single repeatation of search key at the end"),
('steak ' + 'garbage '*998 + 'ghost', "Single repeatation of at either end"),
("The scary ghost ordered an expensive steak", "Single repeatation for smaller string"),
repeat = 100000)
Test Case sub_temp sub_noregex sub_regex sub_noregex_LC
Multiple repeatation of search key 0.2022748797 0.3517142003 0.4518992298 0.1812594258
Single repeatation of search key at the end 0.2026047957 0.3508259952 0.4399926194 0.1915298898
Single repeatation of at either end 0.1877455356 0.3561734007 0.4228843986 0.2164233388
Single repeatation for smaller string 0.2061019057 0.3145984487 0.4252060592 0.1989413449
>>>
根据测试结果
Non Regex LC 和 temp 变量替换具有更好的性能,但 temp 变量的使用性能并不一致
LC 版本比发电机有更好的性能(已确认)
正则表达式的速度要慢两倍以上(因此,如果这段代码是瓶颈,则可以重新考虑实现更改)
正则表达式和非正则表达式版本同样健壮并且可以扩展