python - 将线轮廓仪与多处理一起使用

Question

如何分析使用多处理 (multiprocessing.Pool.map) 的 python 模块，以便每个生成的进程也将逐行分析。

目前我使用 line_profiler 进行分析，但它不支持多处理。有没有办法手动完成？或者也许使用其他工具？

score 0 · Accepted Answer

你可以像这样使用memory_profiler

from memory_profiler import profile
import multiprocessing as mp
import time, psutil, gc, os


@profile(precision=4)
def array_ops(num):
    gc.collect()
    size1 = 10 ** num
    size2 = 20 ** (num+1)
    x = [1] * size1
    y = [2] * size2
    y *= 2
    del y
    gc.collect()
    z = x * 2
    gc.collect()
    return x

if __name__ == '__main__':
    num_workers = 3
    pool = mp.Pool(num_workers)
    pool.map(array_ops, [4,5,6])
    pool.close()
    pool.join()

这是一个示例输出

Line #    Mem usage    Increment   Line Contents
================================================
     6  34.4258 MiB  34.4258 MiB   @profile(precision=4)
     7                             def array_ops(num):
     8  34.4258 MiB   0.0000 MiB       gc.collect()
     9  34.4258 MiB   0.0000 MiB       size1 = 10 ** num
    10  34.4258 MiB   0.0000 MiB       size2 = 20 ** (num+1)
    11  34.5586 MiB   0.1328 MiB       x = [1] * size1
    12  58.7852 MiB  24.2266 MiB       y = [2] * size2
    13  83.2539 MiB  24.4688 MiB       y *= 2
    14  34.6055 MiB   0.0000 MiB       del y
    15  34.6055 MiB   0.0000 MiB       gc.collect()
    16  34.6055 MiB   0.0000 MiB       z = x * 2
    17  34.6055 MiB   0.0000 MiB       gc.collect()
    18  34.6055 MiB   0.0000 MiB       return x


Filename: array_ops.py

Line #    Mem usage    Increment   Line Contents
================================================
     6  34.4258 MiB  34.4258 MiB   @profile(precision=4)
     7                             def array_ops(num):
     8  34.4258 MiB   0.0000 MiB       gc.collect()
     9  34.4258 MiB   0.0000 MiB       size1 = 10 ** num
    10  34.4258 MiB   0.0000 MiB       size2 = 20 ** (num+1)
    11  35.0820 MiB   0.6562 MiB       x = [1] * size1
    12 523.3711 MiB 488.2891 MiB       y = [2] * size2
    13 1011.6172 MiB 488.2461 MiB       y *= 2
    14  35.2969 MiB   0.0000 MiB       del y
    15  35.2969 MiB   0.0000 MiB       gc.collect()
    16  36.5703 MiB   1.2734 MiB       z = x * 2
    17  36.5703 MiB   0.0000 MiB       gc.collect()
    18  36.8242 MiB   0.2539 MiB       return x


Filename: array_ops.py

Line #    Mem usage    Increment   Line Contents
================================================
     6  34.4258 MiB  34.4258 MiB   @profile(precision=4)
     7                             def array_ops(num):
     8  34.4258 MiB   0.0000 MiB       gc.collect()
     9  34.4258 MiB   0.0000 MiB       size1 = 10 ** num
    10  34.4258 MiB   0.0000 MiB       size2 = 20 ** (num+1)
    11  42.0391 MiB   7.6133 MiB       x = [1] * size1
    12 9807.7109 MiB 9765.6719 MiB       y = [2] * size2
    13 19573.2109 MiB 9765.5000 MiB       y *= 2
    14  42.1641 MiB   0.0000 MiB       del y
    15  42.1641 MiB   0.0000 MiB       gc.collect()
    16  57.3594 MiB  15.1953 MiB       z = x * 2
    17  57.3594 MiB   0.0000 MiB       gc.collect()
    18  57.3594 MiB   0.0000 MiB       return x

score 0 · Accepted Answer

使用line_profiler的推荐方法是添加@profile到正在分析的函数中，然后运行kernprof -v -l script.py. 然而，当将它与多处理模块一起使用时，这会导致如下错误：

Can't pickle <class '__main__.Worker'>: attribute lookup Worker on __main__ failed.

为了解决这个问题，我们必须line_profiler在我们想要分析的子流程中设置自己，而不是通过全局进行kernelprof。

例如，假设我们想要run分析我们的一个工作进程的方法。这是设置：

import multiprocessing as mp
import line_profiler

class Worker(mp.Process):

    def run(self):
        prof = line_profiler.LineProfiler()
        # Wrap all functions that you want to be profiled in this process
        # These can be global functions or any class methods
        # Make sure to replace instance methods on a class level, not the bound methods self.run2
        Worker.run2 = prof(Worker.run2)
        ...
        # run the main
        self.run2()
        # store stats in separate file for each process
        prof.dump_stats('worker.lprof')

    def run2(self):
        # real run method renamed
        ...

现在运行脚本，它会生成一个配置文件，然后我们可以使用它来可视化：

python -m line_profiler worker.lprof

python - 将线轮廓仪与多处理一起使用

2 回答 2

Related

Reference