我正在构建一个 python 模块来从大量文本中提取标签,虽然它的结果质量很高,但它的执行速度非常慢。我试图通过使用多处理来加速这个过程,这也很有效,直到我尝试引入一个锁,以便一次只有一个进程连接到我们的数据库。我一生都无法弄清楚如何完成这项工作 - 尽管进行了很多搜索和调整,但我仍然得到一个PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
. 这是有问题的代码 - 在我尝试将锁定对象作为f
.
def make_network(initial_tag, max_tags = 2, max_iter = 3):
manager = Manager()
lock = manager.Lock()
pool = manager.Pool(8)
# this is a very expensive function that I would like to parallelize
# over a list of tags. It involves a (relatively cheap) call to an external
# database, which needs a lock to avoid simultaneous queries. It takes a list
# of strings (tags) as its sole argument, and returns a list of sets with entries
# corresponding to the input list.
f = partial(get_more_tags, max_tags = max_tags, lock = lock)
def _recursively_find_more_tags(tags, level):
if level >= max_iter:
raise StopIteration
new_tags = pool.map(f, tags)
to_search = []
for i, s in zip(tags, new_tags):
for t in s:
joined = ' '.join(t)
print i + "|" + joined
to_search.append(joined)
try:
return _recursively_find_more_tags(to_search, level+1)
except StopIteration:
return None
_recursively_find_more_tags([initial_tag], 0)