numpy - 多处理池卡住

Question

下面的代码是从一个更长的脚本中提取的。顺序版本（没有多处理）工作正常。但是，当我使用时Pool，脚本会卡在特定的行中。

我想将相同的功能crop_image并行应用于从列表中检索到的一组受试者的一些医学成像卷all_subdirs和all_files. 该函数从主体体积的路径加载，nib然后从中提取两个 3D 补丁：第一个补丁的形状为 40x40x40，第二个的形状为 80x80x80。两个补丁具有相同的中心。

在简化的示例中，我只加载了两个主题。两个进程都开始导致print函数内部确实返回：

>>> sub-001_ses-20101210_brain.nii.gz
>>> sub-002_ses-20110815_brain.nii.gz

tf.image.per_image_standardization但是，当必须在 80x80x80 补丁上执行时，程序会无限期挂起。我怀疑这是内存/空间问题，因为如果我将大型补丁也设置为 40x40x40（或更低），脚本运行没有问题。

我可以尝试什么？难道我做错了什么？

以下版本实际上有效，但相对于实际无效的版本非常简化：

import nibabel as nib
import numpy as np
import tensorflow as tf


def crop_image(subdir_path, file_path):
    print(file_path)
    small_scale = []
    big_scale = []

    nii_volume = nib.load(os.path.join(subdir_path, file_path)).get_fdata()  # load volume with nibabel and extract np array

    rows_range, columns_range, slices_range = nii_volume.shape  # save volume dimensions

    for y in range(20, rows_range, 40):  # loop over rows
        for x in range(20, columns_range, 40):  # loop over columns
            for z in range(20, slices_range, 40):  # loop over slices
                small_patch = nii_volume[y - 20:y + 20, x - 20:x + 20, z - 20:z + 20]  # extract small patch
                big_patch = nii_volume[y - 40:y + 40, x - 40:x + 40, z - 40:z + 40]  # extract big patch
                small_patch = tf.image.per_image_standardization(small_patch)  # standardize small patch
                small_scale.append(small_patch)  # append small patch to external list

                # HERE THE CODE GETS STUCK AND EVERYTHING BELOW IS NOT EXECUTED

                big_patch = tf.image.per_image_standardization(big_patch)  # standardize big patch
                big_scale.append(big_patch)  # append big patch to external list

    # create tf.Dataset with lists (small_scale and big_scale)
    # etc..
    # etc..

    final_results = 1  # invented number for the example

    return final_results

if __name__ == '__main__':
    all_subdirs = ['/home/newuser/Desktop/sub-001/ses-20101210/anat', '/home/newuser/Desktop/sub-002/ses-20110815/anat']
    all_files = ['sub-001_ses-20101210_brain.nii.gz', 'sub-002_ses-20110815_brain.nii.gz']

    # DEFINE pool of processes
    num_workers = mp.cpu_count()  # save number of available CPUs (threads)
    pool = mp.Pool(processes=num_workers)  # create pool object and set as many processes as there are CPUs
    outputs = [pool.apply_async(crop_image, args=(path_pair[0], path_pair[1])) for path_pair in zip(all_subdirs, all_files)]

先感谢您！

numpy - 多处理池卡住

0 回答 0

Related

Reference