由于您正在逐字处理mystring
,因此 mystring 肯定可以用作集合。mystring
然后只取包含单词的集合和目标单词组之间的交集:
In [370]: mystring=set(['foobar','barfoo','foo'])
In [371]: mystring.intersection(set(['foo', 'bar', 'hello']))
Out[371]: set(['foo'])
您的逻辑“或”是两组交集的成员。
使用一套也更快。以下是相对时间与生成器和正则表达式:
f1: generator to test against large string
f2: re to test against large string
f3: set intersection of two sets of words
rate/sec f2 f1 f3
f2 101,333 -- -95.0% -95.5%
f1 2,026,329 1899.7% -- -10.1%
f3 2,253,539 2123.9% 11.2% --
所以生成器和in
操作比正则表达式快 19 倍,集合交集比正则表达式快 21 倍,比生成器快 11%。
这是生成时间的代码:
import re
with open('/usr/share/dict/words','r') as fin:
set_words={word.strip() for word in fin}
s_words=' '.join(set_words)
target=set(['bar','foo','hello'])
target_re = re.compile("(%s)" % ("|".join(re.escape(word) for word in target), ))
gen_target=(word for word in ('bar','foo','hello'))
def f1():
""" generator to test against large string """
if any(s in s_words for s in gen_target):
return True
def f2():
""" re to test against large string """
if re.search(target_re, s_words):
return True
def f3():
""" set intersection of two sets of words """
if target.intersection(set_words):
return True
funcs=[f1,f2,f3]
legend(funcs)
cmpthese(funcs)