python - 如何通过 python 多处理利用所有内核

Question

一个多小时以来，我一直在摆弄 Python 的multiprocessing功能，尝试使用multiprocessing.Processand并行化一个相当复杂的图形遍历函数multiprocessing.Manager：

import networkx as nx
import csv
import time 
from operator import itemgetter
import os
import multiprocessing as mp

cutoff = 1

exclusionlist = ["cpd:C00024"]

DG = nx.read_gml("KeggComplete.gml", relabel=True)

for exclusion in exclusionlist:
    DG.remove_node(exclusion)

# checks if 'memorizedPaths exists, and if not, creates it
fn = os.path.join(os.path.dirname(__file__),
                  'memorizedPaths' + str(cutoff+1))
if not os.path.exists(fn):
    os.makedirs(fn)

manager = mp.Manager()
memorizedPaths = manager.dict()
filepaths = manager.dict()
degreelist = sorted(DG.degree_iter(),
                    key=itemgetter(1),
                    reverse=True)

def _all_simple_paths_graph(item, DG, cutoff, memorizedPaths, filepaths):
    source = item[0]
    uniqueTreePaths = []

    if cutoff < 1:
        return

    visited = [source]
    stack = [iter(DG[source])]

    while stack:
        children = stack[-1]
        child = next(children, None)

        if child is None:
            stack.pop()
            visited.pop()
        elif child in memorizedPaths:
            for path in memorizedPaths[child]:
                newPath = (tuple(visited) + tuple(path))
                if (len(newPath) <= cutoff) and
                    (len(set(visited) & set(path)) == 0):
                    uniqueTreePaths.append(newPath)
            continue
        elif len(visited) < cutoff:
            if child not in visited:
                visited.append(child)
                stack.append(iter(DG[child]))

                if visited not in uniqueTreePaths:
                    uniqueTreePaths.append(tuple(visited))
        else: # len(visited) == cutoff:
            if (visited not in uniqueTreePaths) and
                (child not in visited):
                uniqueTreePaths.append(tuple(visited + [child]))
            stack.pop()
            visited.pop()
    # writes the absolute path of the node path file into the hash table
    filepaths[source] = str(fn) + "/" + str(source) + "path.txt"
    with open (filepaths[source], "wb") as csvfile2:
        writer = csv.writer(csvfile2, delimiter=" ", quotechar="|")
        for path in uniqueTreePaths:
            writer.writerow(path)

    memorizedPaths[source] = uniqueTreePaths

############################################################################

if __name__ == '__main__':
    start = time.clock()

    for item in degreelist:
        test = mp.Process(target=_all_simple_paths_graph,
                          args=(DG, cutoff, item, memorizedPaths, filepaths))
        test.start()
        test.join()

end = time.clock()
print (end-start)

目前 - 虽然运气和魔法 - 它有效（有点）。我的问题是我只使用了 24 个内核中的 12 个。

有人可以解释为什么会这样吗？也许我的代码不是最好的多处理解决方案，或者它是我在 Ubuntu 13.04 x64 上运行的 Intel Xeon CPU E5-2640 @ 2.50GHz x18架构的一个特性？

编辑：

我设法得到：

p = mp.Pool()
for item in degreelist:
    p.apply_async(_all_simple_paths_graph,
                  args=(DG, cutoff, item, memorizedPaths, filepaths))
p.close()
p.join()

但是，工作非常缓慢！所以我假设我在工作中使用了错误的功能。希望它有助于澄清我想要完成的事情！

EDIT2：.map尝试：

partialfunc = partial(_all_simple_paths_graph,
                      DG=DG,
                      cutoff=cutoff,
                      memorizedPaths=memorizedPaths,
                      filepaths=filepaths)
p = mp.Pool()
for item in processList:
    processVar = p.map(partialfunc, xrange(len(processList)))   
p.close()
p.join()

工作，比单核慢。是时候优化了！

score 58 · Accepted Answer

在这里堆积太多，无法在评论中解决，所以，在mp哪里multiprocessing：

mp.cpu_count()应该返回处理器的数量。但是测试一下。有些平台很时髦，而且这些信息并不总是很容易获得。Python 尽其所能。

如果您启动 24 个进程，他们将完全按照您告诉他们的操作 ;-) 看起来mp.Pool()对您来说最方便。您将要创建的进程数传递给其构造函数。 mp.Pool(processes=None)将mp.cpu_count()用于处理器的数量。

然后，您可以使用，例如，.imap_unordered(...)在您的实例上跨进程Pool传播。degreelist或者也许其他一些Pool方法对你更有效——实验。

如果您不能将问题重击到Pool世界观中，您可以改为创建一个mp.Queue来创建工作队列，.put()在主程序中处理节点（或节点切片，以减少开销），然后编写工人.get()从该队列中处理项目。询问您是否需要示例。请注意，您需要在所有“真实”工作项之后将标记值（每个进程一个）放在队列中，以便工作进程可以测试标记以了解它们何时完成。

仅供参考，我喜欢队列，因为它们更明确。许多其他人更喜欢Pools，因为它们更神奇 ;-)

池示例

这是一个可执行的原型。这显示了一种使用方式，imap_unordered不需要Pool更改chunksize任何函数签名。当然，您必须插入您的真实代码；-) 请注意，该init_worker方法允许每个处理器仅传递“大部分”参数一次，而不是为您的degreeslist. 减少进程间通信的数量对于速度至关重要。

import multiprocessing as mp

def init_worker(mps, fps, cut):
    global memorizedPaths, filepaths, cutoff
    global DG

    print "process initializing", mp.current_process()
    memorizedPaths, filepaths, cutoff = mps, fps, cut
    DG = 1##nx.read_gml("KeggComplete.gml", relabel = True)

def work(item):
    _all_simple_paths_graph(DG, cutoff, item, memorizedPaths, filepaths)

def _all_simple_paths_graph(DG, cutoff, item, memorizedPaths, filepaths):
    pass # print "doing " + str(item)

if __name__ == "__main__":
    m = mp.Manager()
    memorizedPaths = m.dict()
    filepaths = m.dict()
    cutoff = 1 ##
    # use all available CPUs
    p = mp.Pool(initializer=init_worker, initargs=(memorizedPaths,
                                                   filepaths,
                                                   cutoff))
    degreelist = range(100000) ##
    for _ in p.imap_unordered(work, degreelist, chunksize=500):
        pass
    p.close()
    p.join()

我强烈建议完全按原样运行它，这样你就可以看到它的速度非常快。然后稍微添加一些东西，看看它是如何影响时间的。例如，只需添加

   memorizedPaths[item] = item

大大_all_simple_paths_graph()减慢它的速度。为什么？因为字典随着每次添加而变得越来越大，并且这个进程安全的字典必须在所有进程之间同步（在幕后）。同步的单位是“整个字典”——mp 机器没有内部结构可以用来对共享字典进行增量更新。

如果您负担不起这笔费用，那么您不能Manager.dict()为此使用 a。聪明的机会比比皆是；-)

python - 如何通过 python 多处理利用所有内核

1 回答 1

池示例

Related

Reference