虽然在单个线程中您无法获得比 PIL 裁剪更快的速度,但您可以使用多个内核来加速一切!:)
我在我的 8 核 i7 机器以及我 7 岁、两核、几乎没有 2ghz 的笔记本电脑上运行了以下代码。两者都看到了运行时间的显着改善。正如您所期望的那样,改进取决于可用内核的数量。
您的代码的核心是相同的,我只是将循环与实际计算分开,以便该函数可以并行应用于值列表。
所以这:
for i in range(0,num_images):
t = time.time()
im = Image.open('%03i.png'%i)
w,h = im.size
imc = im.crop((w-50,h-50,w+50,h+50))
print 'Time to open: %.4f seconds'%(time.time()-t)
#convert them to numpy arrays
data = np.array(imc)
成为:
def convert(filename):
im = Image.open(filename)
w,h = im.size
imc = im.crop((w-50,h-50,w+50,h+50))
return numpy.array(imc)
加速的关键是库的Pool
特性multiprocessing
。它使跨多个处理器运行事物变得微不足道。
完整代码:
import os
import time
import numpy
from PIL import Image
from multiprocessing import Pool
# Path to where my test images are stored
img_folder = os.path.join(os.getcwd(), 'test_images')
# Collects all of the filenames for the images
# I want to process
images = [os.path.join(img_folder,f)
for f in os.listdir(img_folder)
if '.jpeg' in f]
# Your code, but wrapped up in a function
def convert(filename):
im = Image.open(filename)
w,h = im.size
imc = im.crop((w-50,h-50,w+50,h+50))
return numpy.array(imc)
def main():
# This is the hero of the code. It creates pool of
# worker processes across which you can "map" a function
pool = Pool()
t = time.time()
# We run it normally (single core) first
np_arrays = map(convert, images)
print 'Time to open %i images in single thread: %.4f seconds'%(len(images), time.time()-t)
t = time.time()
# now we run the same thing, but this time leveraging the worker pool.
np_arrays = pool.map(convert, images)
print 'Time to open %i images with multiple threads: %.4f seconds'%(len(images), time.time()-t)
if __name__ == '__main__':
main()
很基本。只需几行额外的代码,并进行一些重构以将转换位移动到它自己的函数中。结果不言自明:
结果:
8核i7
Time to open 858 images in single thread: 6.0040 seconds
Time to open 858 images with multiple threads: 1.4800 seconds
2核英特尔双核
Time to open 858 images in single thread: 8.7640 seconds
Time to open 858 images with multiple threads: 4.6440 seconds
所以你去吧!即使您拥有一台超级旧的 2 核机器,您也可以将打开和处理图像的时间减半。
注意事项
记忆。如果您要处理 1000 张图像,您可能会在某个时候弹出 Python 的内存限制。为了解决这个问题,您只需要分块处理数据。您仍然可以利用所有的多处理优势,只是在较小的部分。就像是:
for i in range(0, len(images), chunk_size):
results = pool.map(convert, images[i : i+chunk_size])
# rest of code.