2

我想要一些守护进程来找到我需要转换成网络和拇指版本的图像。我认为 python 在这里可能很有用,但我不确定我是否在这里做事。我想同时转换8张照片,要转换的图像队列可能很长。我们在服务器上有几个核心,在一个新进程中生成每个转换应该让操作系统利用可用的核心,事情会变得更快,对吧?这是这里的关键点,从 python 中创建一个再次调用 imagemagick 的转换脚本的进程,并希望事情比从 python 主线程运行一对一的转换快一点。

到目前为止,我才开始测试。所以这是我的测试代码。它将创建 20 个任务(即休眠 1 到 5 秒),并将这些任务分配给一个总共有 5 个线程的池。

from multiprocessing import Process
from subprocess import call
from random import randrange
from threading import Thread
from Queue import Queue

class Worker(Thread):
    def __init__(self, tid, queue):
        Thread.__init__(self)
        self.tid = tid
        self.queue = queue
        self.daemon = True
        self.start()

    def run(self):
        while True:
            sec = self.queue.get()
            print "Thread %d sleeping for %d seconds\n\n" % (self.tid, sec)
            p = Process(target=work, args=(sec,))
            p.start()
            p.join()
            self.queue.task_done()

class WorkerPool:
    def __init__(self, num_workers):
        self.queue = Queue()
        for tid in range(num_workers):
            Worker(tid, self.queue)

    def add_task(self, sec):
        self.queue.put(sec)

    def complete_work(self):
        self.queue.join()

def work(sec):
    call(["sleep", str(sec)])

def main():
    seconds = [randrange(1, 5) for i in range(20)]
    pool = WorkerPool(5)
    for sec in seconds:
        pool.add_task(sec)
    pool.complete_work()

if __name__ == '__main__':
    main()

所以我在服务器上运行这个脚本:

johanhar@mamadev:~$ python pythonprocesstest.py

然后我检查服务器上的进程:

johanhar@mamadev:~$ ps -fux

结果在ps我看来是错误的。对我来说,看起来好像我在 python 下发生了一些事情,但在一个进程中,所以即使我们在服务器上有多个内核,它只会越慢转换(或在这个测试用例中睡眠)越慢......

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
johanhar 24246  0.0  0.0  81688  1608 ?        S    13:44   0:00 sshd: johanhar@pts/28
johanhar 24247  0.0  0.0 108336  1832 pts/28   Ss   13:44   0:00  \_ -bash
johanhar 49753  0.6  0.0 530620  7512 pts/28   Sl+  15:14   0:00      \_ python pythonprocesstest.py
johanhar 49822  0.0  0.0 530620  6252 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49824  0.0  0.0 100904   564 pts/28   S+   15:14   0:00          |   \_ sleep 4
johanhar 49823  0.0  0.0 530620  6256 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49826  0.0  0.0 100904   564 pts/28   S+   15:14   0:00          |   \_ sleep 3
johanhar 49837  0.0  0.0 530620  6264 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49838  0.0  0.0 100904   564 pts/28   S+   15:14   0:00          |   \_ sleep 3
johanhar 49846  0.0  0.0 530620  6264 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49847  0.0  0.0 100904   564 pts/28   S+   15:14   0:00              \_ sleep 3

因此,如果您仍然没有得到问题或我的要求。这种方法可以称为“多核编程”吗?

4

3 回答 3

2

我认为您误读了ps输出。我计算了 4 个不同的 Python 实例,原则上每个实例都可以分配给自己的核心。他们是否真的拥有自己的核心是多处理的难点之一。

是的,有一个高级 Python 进程(PID 49753)是子进程的父进程,但也有bash一个类似的父进程。

于 2013-04-26T13:44:21.643 回答
1

你可以简化你的代码。如果工作在子进程中完成,则不需要多个 Python 进程。您可以multiprocessing.Pool用来限制并发子进程的数量:

#!/usr/bin/env python
import multiprocessing.dummy as mp # use threads
from random import randrange
from subprocess import check_call
from timeit import default_timer as timer

def info(msg, _print_lock=mp.Lock()): # a poor man's logging.info()
    with _print_lock: # avoid garbled output
        print("%s\t%s" % (mp.current_process().name, msg))

def work(sec):
    try: # wrap in try/except to avoid premature exit
        info("Sleeping for %d seconds" % (sec,))
        start = timer()
        check_call(["sleep", str(sec)])
    except Exception as e: # error
        return sec, timer() - start, e
    else: # success
        return sec, timer() - start, None

def main():
    work_items = (randrange(1, 5) for i in range(20)) # you can use generator
    pool = mp.Pool(5) # pool of worker threads
    for result in pool.imap_unordered(work, work_items):
        info("expected %s, got %s, error %s" % result)
    pool.close()
    pool.join()

if __name__ == '__main__':
    main()

输出

Thread-2    Sleeping for 3 seconds
Thread-4    Sleeping for 4 seconds
Thread-3    Sleeping for 3 seconds
Thread-5    Sleeping for 2 seconds
Thread-1    Sleeping for 1 seconds
Thread-1    Sleeping for 2 seconds
MainThread  expected 1, got 1.00222706795, error None
Thread-5    Sleeping for 2 seconds
MainThread  expected 2, got 2.00276088715, error None
Thread-2    Sleeping for 1 seconds
MainThread  expected 3, got 3.00330615044, error None
Thread-1    Sleeping for 3 seconds
MainThread  expected 2, got 2.00289702415, error None
Thread-4    Sleeping for 1 seconds
Thread-3    Sleeping for 2 seconds
MainThread  expected 4, got 4.00349998474, error None
MainThread  expected 3, got 4.00295114517, error None
Thread-2    Sleeping for 2 seconds
MainThread  expected 1, got 1.00295495987, error None
Thread-5    Sleeping for 2 seconds
MainThread  expected 2, got 2.0029540062, error None
Thread-4    Sleeping for 2 seconds
MainThread  expected 1, got 1.00314211845, error None
Thread-3    Sleeping for 4 seconds
MainThread  expected 2, got 2.00298595428, error None
Thread-2    Sleeping for 2 seconds
MainThread  expected 2, got 2.00294113159, error None
Thread-5    Sleeping for 1 seconds
MainThread  expected 2, got 2.00287604332, error None
Thread-1    Sleeping for 4 seconds
MainThread  expected 3, got 3.00323104858, error None
Thread-4    Sleeping for 3 seconds
MainThread  expected 2, got 2.00339794159, error None
Thread-5    Sleeping for 1 seconds
MainThread  expected 1, got 1.00312304497, error None
MainThread  expected 2, got 2.0027179718, error None
MainThread  expected 1, got 1.00284385681, error None
MainThread  expected 4, got 4.00334811211, error None
MainThread  expected 3, got 3.00306892395, error None
MainThread  expected 4, got 4.00330901146, error None
于 2013-04-27T05:42:30.260 回答
1

简短直接:是的,您正在convert多个内核上运行多个进程。

更长且稍微间接:我不会称它为“多核编程”,即使它实际上是,因为这种措辞通常意味着在多个内核上运行程序的多个线程,而你没有这样做(至少在 CPython 中, python 线程受 GIL 约束,实际上不能在多个内核上同时运行)。此外,您不需要并行化您的 python 代码,因为这不是您的瓶颈(您将时间花在convert,而不是 python 代码中)

如果你只想并行化convert,你甚至不需要在你的 python 代码中使用任何线程或其他花哨的东西。

python 脚本可以被序列化并循环遍历照片,产生新的转换过程,直到你达到你喜欢的数字。然后就坐等其中一个完成并产生一个新的;根据需要对所有照片重复。

(但我确实同意线程比那种等待事件循环更自然和优雅的代码)

于 2013-04-26T15:12:02.567 回答