2

我有一个数据分析脚本,它接受一个参数来指定要执行的分析段。我想在“n”是机器上的核心数的时候运行最多“n”个脚本实例。复杂之处在于,分析的部分比核心多,所以我想最多同时运行“n”个进程,其中一个完成,启动另一个。有没有人在使用 subprocess 模块之前做过这样的事情?

4

2 回答 2

10

我确实认为多处理模块将帮助您实现所需的功能。看一下示例技术。

import multiprocessing

def do_calculation(data):
    """
    @note: you can define your calculation code
    """
    return data * 2

def start_process():
    print 'Starting', multiprocessing.current_process().name

if __name__ == '__main__':
    analsys_jobs = list(range(10))  # could be your analysis work
    print 'analsys_jobs :', analsys_jobs

    pool_size = multiprocessing.cpu_count() * 2
    pool = multiprocessing.Pool(processes=pool_size,
                                initializer=start_process,
                                maxtasksperchild=2, ) 
    #maxtasksperchild = tells the pool to restart a worker process \
    # after it has finished a few tasks. This can be used to avoid \ 
    # having long-running workers consume ever more system resources

    pool_outputs = pool.map(do_calculation, analsys_jobs)   
    #The result of the map() method is functionally equivalent to the \
    # built-in map(), except that individual tasks run in parallel. \
    # Since the pool is processing its inputs in parallel, close() and join()\
    # can be used to synchronize the main process with the \
    # task processes to ensure proper cleanup.  

    pool.close() # no more tasks
    pool.join() # wrap up current tasks


    print 'Pool :', pool_outputs

您可以从这里找到好的多处理技术

于 2012-10-07T05:07:54.213 回答
1

使用multiprocessing模块,特别是Pool类。Pool创建一个进程池(默认情况下,与 CPU 一样多的进程),并允许您将作业提交到池中,这些作业将在下一个空闲进程上执行。它负责所有子流程管理和任务之间传递数据的细节,因此您可以以非常直接的方式编写代码。有关使用的一些示例,请参阅文档。

于 2012-10-07T04:36:46.653 回答