python - python multiprocessing vs threading for cpu bound work on windows and linux

Question

所以我敲了一些测试代码，看看多处理模块与线程相比如何在 cpu 绑定工作上进行扩展。在 linux 上，我得到了预期的性能提升：

linux（双四核至强）：
serialrun 耗时 1192.319 毫秒
并行运行耗时 346.727 毫秒
线程运行耗时 2108.172 毫秒

我的双核 macbook pro 显示相同的行为：

osx (双核 macbook pro)
serialrun 耗时 2026.995 毫秒
并行运行耗时 1288.723 毫秒
线程运行耗时 5314.822 毫秒

然后我去 Windows 机器上试了一下，得到了一些非常不同的结果。

窗户（i7 920）：
serialrun 耗时 1043.000 毫秒
并行运行耗时 3237.000 毫秒
线程运行耗时 2343.000 毫秒

为什么哦，为什么，Windows 上的多处理方法要慢得多？

这是测试代码：

#!/usr/bin/env python

导入多处理
导入线程
进口时间

def print_timing(func):
    def 包装器（*arg）：
        t1 = time.time()
        res = func(*arg)
        t2 = time.time()
        print '%s 耗时 %0.3f ms' % (func.func_name, (t2-t1)*1000.0)
        返回资源
    返回包装


定义计数器（）：
    对于 xrange(1000000) 中的 i：
        经过

@print_timing
def 串行运行（x）：
    对于 xrange(x) 中的 i：
        柜台（）

@print_timing
定义并行运行（x）：
    进程列表 = []
    对于 xrange(x) 中的 i：
        p = multiprocessing.Process（目标=计数器）
        proclist.append(p)
        p.start()

    对于我在 proclist 中：
        我加入（）

@print_timing
def 线程运行（x）：
    线程列表 = []
    对于 xrange(x) 中的 i：
        t = threading.Thread（目标=计数器）
        线程列表.append(t)
        t.start()

    对于线程列表中的 i：
        我加入（）

定义主（）：
    串行运行（50）
    并行运行（50）
    线程运行(50)

如果 __name__ == '__main__'：
    主要的（）

score 26 · Accepted Answer

多处理的python 文档将 os.fork() 归咎于 Windows 中的问题。这里可能适用。

看看当你导入 psyco 时会发生什么。首先，easy_install：

C:\Users\hughdbrown>\Python26\scripts\easy_install.exe psyco
Searching for psyco
Best match: psyco 1.6
Adding psyco 1.6 to easy-install.pth file

Using c:\python26\lib\site-packages
Processing dependencies for psyco
Finished processing dependencies for psyco

将此添加到 python 脚本的顶部：

import psyco
psyco.full()

我没有得到这些结果：

serialrun took 1191.000 ms
parallelrun took 3738.000 ms
threadedrun took 2728.000 ms

我得到这些结果：

serialrun took 43.000 ms
parallelrun took 3650.000 ms
threadedrun took 265.000 ms

并行仍然很慢，但其他人烧橡胶。

编辑：另外，尝试使用多处理池。（这是我第一次尝试这个，它是如此之快，我想我一定错过了一些东西。）

@print_timing
def parallelpoolrun(reps):
    pool = multiprocessing.Pool(processes=4)
    result = pool.apply_async(counter, (reps,))

结果：

C:\Users\hughdbrown\Documents\python\StackOverflow>python  1289813.py
serialrun took 57.000 ms
parallelrun took 3716.000 ms
parallelpoolrun took 128.000 ms
threadedrun took 58.000 ms

score 23 · Accepted Answer

在 UNIX 变体下，进程要轻得多。Windows 进程很繁重，需要更多时间才能启动。线程是在 Windows 上进行多处理的推荐方式。

score 5 · Accepted Answer

据说在 Windows 上创建进程比在 linux 上更昂贵。如果您在该网站周围搜索，您会发现一些信息。这是我很容易找到的一个。

score 2 · Accepted Answer

刚启动游泳池需要很长时间。我在“现实世界”的程序中发现，如果我可以保持一个池打开并为许多不同的进程重用它，通过方法调用（通常使用 map.async）向下传递引用，那么在 Linux 上我可以节省百分之几，但在 Windows 上我通常可以将花费的时间减半。对于我的特定问题，Linux 总是更快，但即使在 Windows 上，我也可以从多处理中获得净收益。

score 1 · Accepted Answer

目前，您的 counter() 函数没有修改太多状态。尝试更改 counter() 以便它修改许多内存页面。然后运行一个 cpu 绑定循环。看看linux和windows之间是否还有很大的差距。

我现在没有运行 python 2.6，所以我不能自己尝试。

python - python multiprocessing vs threading for cpu bound work on windows and linux

5 回答 5

Related

Reference