python - Python 中的并行性

Question

在 Python 中实现并行性有哪些选择？我想对一些非常大的栅格执行一堆 CPU 绑定计算，并希望将它们并行化。来自 C 背景，我熟悉并行性的三种方法：

消息传递过程，可能分布在集群中，例如MPI。
显式共享内存并行性，使用pthreads或fork()、pipe()等。人
使用OpenMP的隐式共享内存并行性。

决定使用的方法是一种权衡。

在 Python 中，有哪些方法可用，它们的特点是什么？是否有可集群的MPI克隆？实现共享内存并行的首选方法是什么？我听说过有关GIL的问题以及对tasklet的引用。

简而言之，在选择 Python 中的不同并行化策略之前，我需要了解哪些信息？

score 16 · Accepted Answer

通常，您描述的是 CPU 绑定计算。这不是 Python 的强项。从历史上看，多处理也不是。

主流 Python 解释器中的线程被可怕的全局锁所统治。新的多处理API 解决了这个问题，并提供了一个带有管道和队列等的工作池抽象。

您可以使用C或Cython编写性能关键代码，并使用 Python 作为粘合剂。

score 6 · Accepted Answer

新的（2.6）多处理模块是要走的路。它使用子进程来解决GIL问题。它还抽象出一些本地/远程问题，因此可以稍后选择在本地运行代码还是分散在集群上。我在上面链接的文档有点值得细读，但应该为入门提供良好的基础。

score 5 · Accepted Answer

Ray是一个优雅（且快速）的库。

并行化 Python 函数的最基本策略是使用@ray.remote装饰器声明一个函数。然后可以异步调用它。

import ray
import time

# Start the Ray processes (e.g., a scheduler and shared-memory object store).
ray.init(num_cpus=8)

@ray.remote
def f():
    time.sleep(1)

# This should take one second assuming you have at least 4 cores.
ray.get([f.remote() for _ in range(4)])

您还可以使用actor并行化有状态计算，同样使用@ray.remote装饰器。

# This assumes you already ran 'import ray' and 'ray.init()'.

import time

@ray.remote
class Counter(object):
    def __init__(self):
        self.x = 0

    def inc(self):
        self.x += 1

    def get_counter(self):
        return self.x

# Create two actors which will operate in parallel.
counter1 = Counter.remote()
counter2 = Counter.remote()

@ray.remote
def update_counters(counter1, counter2):
    for _ in range(1000):
        time.sleep(0.25)
        counter1.inc.remote()
        counter2.inc.remote()

# Start three tasks that update the counters in the background also in parallel.
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)

# Check the counter values.
for _ in range(5):
    counter1_val = ray.get(counter1.get_counter.remote())
    counter2_val = ray.get(counter2.get_counter.remote())
    print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))
    time.sleep(1)

与多处理模块相比，它具有许多优点：

相同的代码在单个多核机器以及大型集群上运行。
使用共享内存和高效的序列化在同一台机器上的进程之间有效地共享数据。
您可以并行化 Python 函数（使用任务）和Python 类（使用演员）。
错误消息很好地传播。

Ray是我一直在帮助开发的一个框架。

score 2 · Accepted Answer

根据您需要处理多少数据以及您打算使用多少 CPU/机器，在某些情况下，最好用 C（或者如果您想使用 jython/IronPython，则使用 Java/C#）编写其中的一部分

与在 8 个 CPU 上并行运行相比，您可以从中获得的加速对性能的影响更大。

score 1 · Accepted Answer

有很多包可以做到这一点，正如其他人所说，最合适的是多处理，特别是“Pool”类。

并行 python可以获得类似的结果，此外它还设计为与集群一起使用。

无论如何，我会说多处理。

python - Python 中的并行性

5 回答 5

Related

Reference