2

再会!

我正在尝试在 python 中学习多线程功能,并编写了以下代码:

import time, argparse, threading, sys, subprocess, os

def item_fun(items, indices, lock):
    for index in indices:
        items[index] = items[index]*items[index]*items[index]

def map(items, cores):  

    count = len(items)
    cpi = count/cores
    threads = []
    lock = threading.Lock()
    for core in range(cores):
        thread = threading.Thread(target=item_fun, args=(items, range(core*cpi, core*cpi + cpi), lock))
        threads.append(thread)
        thread.start()
    item_fun(items, range((core+1)*cpi, count), lock)
    for thread in threads:
        thread.join()


parser = argparse.ArgumentParser(description='cube', usage='%(prog)s [options] -n')
parser.add_argument('-n', action='store', help='number', dest='n', default='1000000', metavar = '')
parser.add_argument('-mp', action='store_true', help='multi thread', dest='mp', default='True')
args = parser.parse_args()

items = range(NUMBER_OF_ITEMS)
# print 'items before:'
# print items
mp = args.mp
if mp is True:
    NUMBER_OF_PROCESSORS = int(os.getenv("NUMBER_OF_PROCESSORS"))
    NUMBER_OF_ITEMS = int(args.n)
    start = time.time()
    map(items, NUMBER_OF_PROCESSORS)
    end = time.time()
else:
    NUMBER_OF_ITEMS = int(args.n)
    start = time.time()
    item_fun(items, range(NUMBER_OF_ITEMS), None)
    end = time.time()       
#print 'items after:'
#print items
print 'time elapsed: ', (end - start)

当我使用 mp 参数时,它的工作速度较慢,在我有 4 个 CPU 的机器上,计算结果大约需要 0.5 秒,而如果我使用单线程则需要大约 0.3 秒。

难道我做错了什么?

我知道有 Pool.map() 等,但它产生子进程而不是线程,据我所知它工作得更快,但我想编写自己的线程池。

4

2 回答 2

5

Python has no true multithreading, due to an implementation detail called the "GIL". Only one thread actually runs at a time, and Python switches between the threads. (Third party implementations of Python, such as Jython, can actually run parallel threads.)

As to why actually your program is slower in the multithreaded version depends, but when coding for Python, one needs to be aware of the GIL, so one does not believe that CPU bound loads are more efficiently processed by adding threads to the program.

Other things to be aware of are for instance multiprocessing and numpy for solving CPU bound loads, and PyEv (minimal) and Tornado (huge kitchen sink) for solving I/O bound loads.

于 2013-07-29T09:33:31.537 回答
4

如果您有受 IO 限制的线程,您只会看到 Python 中线程的吞吐量增加。如果您正在做的是受 CPU 限制,那么您将看不到任何吞吐量增加。

在 Python 中打开线程支持(通过启动另一个线程)似乎也会使某些事情变慢,因此您可能会发现整体性能仍然受到影响。

当然,这都是 cpython,其他 Python 实现有不同的行为。

于 2013-07-29T09:40:14.360 回答