python - Combination of parallel processing and dask arrays to process multiple image stacks

Question

I have a directory containing n h5 file each of which has m image stacks to filter. For each image, I will run the filtering (gaussian and laplacian) using dask parallel arrays in order to speed up the processing (Ref to Dask). I will use the dask arrays through the apply_parallel() function in scikit-image.
I will run the processing on a small server with 20 cpus.

I would like to get an advice to which parallel strategy will make more sense to use:

1) Sequential processing of the h5 files and all the cpus for dask processing
2) Parallel processing of the h5 files with x cores and use the remaining 20-x to dask processing.
3) Distribute the resources and parallel processing the h5 files, the images in each h5 files and the remaining resources for dask.

thanks for the help!

score 0 · Accepted Answer

最好以尽可能简单的方式进行并行化。如果您有多个文件并且只想在每个文件上运行相同的计算，那么这几乎可以肯定是最简单的方法。如果这使您的计算资源饱和，那么您可以在这里停下来，而无需深入研究更复杂的方法。

如果这确实是您的情况，那么您可以使用、或各种其他库中的任何一个来并行dask化完成make。concurrent.futures

如果还有其他问题，例如尝试并行化操作本身或确保您不会耗尽内存，那么您将被迫进入更复杂的系统，例如 dask，但情况可能并非如此。

score 0 · Accepted Answer

用于make并行化。

make -j20你可以告诉make并行运行 20 个进程。

通过使用多个进程，您可以避免“全局解释器锁定”的成本。对于独立任务，使用多个独立进程更有效（如果您有疑问，请进行基准测试）。Make 非常适合处理需要对每个文件应用相同命令的整个文件夹 - 它传统上用于编译源代码，但它可用于运行任意命令。

python - Combination of parallel processing and dask arrays to process multiple image stacks

2 回答 2

Related

Reference