让我坦率地说,我编码是为了好玩,这是我过去几天在业余时间一直在努力的一个编码挑战。挑战在于我得到一堆由空格(文档)分隔的单词,然后是列表中的几个搜索词。我必须在文档中找到最接近这些 searchTerms 的位置。基本上,找到包含所有 searchTerms 的文档的最小子集并输出该子集。到目前为止,我的功能似乎在我的系统上运行。但是,当我上传时,我被告知我的算法执行时间太长。我的想法是在文档中找到 searchTerm 的每个实例,然后针对它运行 itertools.product()。然后我测试每一个,根据索引值确定哪一个是最短的。这是我到目前为止所拥有的:
def answer(document, searchTerms):
from itertools import product
#build a list of the input document
document = document.split()
index = []
#find all indexes for the searchTerms and build a list of lists
for w in searchTerms:
index.append([i for i,x in enumerate(document) if x == w])
#build iterator of all possible combinations of indexes for each of the searchTerms
combinations = product(*index)
#recover memory
del index
#build tuple of minimum distance between all search terms
shortest = min(((max(x) - min(x),x) for x in combinations),key=lambda x: x[0])
return (' '.join(document[min(shortest[1]):max(shortest[1])+1]))
我尝试使用多处理来加速我的代码部分,但还没有完全掌握语法。例如:
from multiprocessing import Pool
p = Pool(processes=2)
shortest = p.map(min_max,combinations)
def min_max(combinations):
return min(((max(x) - min(x),x) for x in combinations))
结果是:
Traceback (most recent call last):
File "./searchTerms2.py", line 65, in <module>
print (answer(document,searchTerms))
File "./searchTerms2.py", line 45, in answer
shortest = p.map(min_max,combinations)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
TypeError: 'int' object is not iterable
任何指针将不胜感激。有没有更好的方法来解决这个问题?有没有可以提高效率的地方?
--EDIT-- 问题的进一步解释:
document = 'this is a song that never ends it goes on and on my friend some people started singing it not knowing what it was and they will continue singing it forever just because this is the song'
searchTerms = ['this', 'goes','on']
应该导致:
'this is a song that never ends it goes on'
这适用于我当前的算法,但如果给定更大的文档和 searchTerms,则速度不够快。我希望这更清楚...
我一直在计时我的代码,看起来我最大的性能打击来自:
shortest = min(((max(x) - min(x),x) for x in combinations),key=lambda x: x[0])
当我增加“文档”中的单词数量并在“搜索词”中添加额外的搜索词时,我看到该行的性能受到很大影响。其他一切都与我能说的相差无几。