python - 使用 Python 的 concurrent.futures 并行处理对象

Question

我刚开始使用concurrent.futuresPython 3 中的库将一些函数应用于图像列表，以便处理这些图像并重塑它们。函数是resize(height, width)和opacity(number)。

另一方面，我有images()产生类似文件的对象的功能，所以我尝试了这段代码来并行处理我的图像：

import concurrent.futures
From mainfile import images
From mainfile import shape


def parallel_image_processing :
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    future = executor.submit(images)
    for fileobject in future.result() :
        future1 = executor.submit( shape.resize, fileobject, "65","85")
        future2 = executor.submit( shape.opacity, fileobject, "0.5")

有人能告诉我我是否走在正确的道路上来实现这一目标吗？

score 3 · Accepted Answer

我建议images只返回一个路径，而不是一个打开的文件对象：

def images():
    ...
    yield os.path.join(image_dir[0], filename)

然后使用这个：

from functools import partial

def open_and_call(func, filename, args=(), kwargs={}):
    with open(filename, 'rb') as f:
        return func(f, *args, **kwargs)

def parallel_image_processing():
    resize_func = partial(open_and_call, shape.resize, args=("65", "85"))
    opacity_func = partial(open_and_call, shape.opacity, args=("0.5"))
    img_list = list(images())
    with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
        futures1 = executor.map(resize_func, img_list)
        futures2 = executor.map(opacity_func, img_list)

        concurrent.futures.wait([futures1, futures2])


if __name__ == "__main__":
    # Make sure the entry point to the function that creates the executor 
    # is inside an `if __name__ == "__main__"` guard if you're on Windows.
    parallel_image_processing()

如果您使用的是 CPython（相对于没有 GIL 的替代实现，如 Jython），您不想使用ThreadPoolExecutor，因为图像处理是 CPU 密集型的；由于 GIL，在 CPython 中一次只能运行一个线程，因此如果您将线程用于您的用例，您实际上不会并行执行任何操作。相反， use ProcessPoolExecutor，它将使用进程而不是线程，完全避免 GIL。请注意，这就是我建议不要从中返回类似文件的对象的原因images——您不能将打开的文件句柄传递给工作进程。您必须改为在工作人员中打开文件。

为此，我们executor调用了一个小 shim 函数 ( open_and_call)，它将在工作进程中打开文件，然后使用正确的参数调用resize/opacity函数。

我也在使用executor.map而不是executor.submit，这样我们就可以在没有显式 for 循环的情况下为返回的每个项目调用resize/ 。我使用它使调用带有多个参数的函数变得更容易（它只允许您调用带有单个参数的函数）。opacityimages()functools.partialexecutor.map

也无需调用images()执行程序，因为无论如何您都将等待其结果，然后再继续。只需像普通函数一样调用它。我也将返回的生成器对象转换为调用之前images()的 a 。如果您担心内存使用情况，您可以在每次调用时直接调用，但如果不是，只调用一次并将其存储为列表可能会更快。listmapimages()mapimages()

python - 使用 Python 的 concurrent.futures 并行处理对象

1 回答 1

Related

Reference