python - 如何在 Python 中进行非阻塞 URL 获取

Question

我正在Pyglet中编写一个 GUI 应用程序，它必须显示来自 Internet 的数十到数百个缩略图。现在，我正在使用urllib.urlretrieve来抓取它们，但是每次都会阻塞直到它们完成，并且一次只能抓取一个。

我宁愿并行下载它们，并在完成后立即显示它们，而不会在任何时候阻塞 GUI。做这个的最好方式是什么？

我对线程了解不多，但看起来线程模块可能会有所帮助？或者也许有一些我忽略的简单方法。

score 3 · Accepted Answer

您可能会受益于threadingormultiprocessing模块。您实际上不需要自己创建所有这些Thread基于类的，有一个更简单的方法使用Pool.map：

from multiprocessing import Pool

def fetch_url(url):
    # Fetch the URL contents and save it anywhere you need and
    # return something meaningful (like filename or error code),
    # if you wish.
    ...

pool = Pool(processes=4)
result = pool.map(f, image_url_list)

score 2 · Accepted Answer

正如您所怀疑的，这是线程的完美情况。这是一个简短的指南，当我在 python 中做我自己的第一个线程时，我发现它非常有用。

score 2 · Accepted Answer

正如您正确指出的那样，您可以创建多个线程，每个线程负责执行 urlretrieve 操作。这允许主线程不间断地继续。

这是一个关于python线程的教程：http: //heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf

score 2 · Accepted Answer

下面是一个如何使用 threading.Thread 的示例。只需将类名替换为您自己的，并将 run 函数替换为您自己的即可。请注意，线程非常适合像您这样的 IO 受限应用程序，并且可以真正加快它的速度。在标准 python 中严格使用 pythong 线程进行计算并没有帮助，因为一次只能计算一个线程。

import threading, time
class Ping(threading.Thread):
    def __init__(self, multiple):
        threading.Thread.__init__(self)
        self.multiple = multiple
    def run(self):
        #sleeps 3 seconds then prints 'pong' x times
        time.sleep(3)
        printString = 'pong' * self.multiple

pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now

score 1 · Accepted Answer

您有以下选择：

线程：最简单但不能很好地扩展
Twisted：中等难度，可扩展但由于 GIL 和单线程而共享 CPU。
多处理：最难。如果您知道如何编写自己的事件循环，则可以很好地扩展。

我建议只使用线程，除非您需要工业规模的抓取器。

score 0 · Accepted Answer

您要么需要使用线程，要么需要使用Twisted等异步网络库。我怀疑在您的特定用例中使用线程可能更简单。

python - 如何在 Python 中进行非阻塞 URL 获取

6 回答 6

Related

Reference