python - 有没有办法在不同的线程上运行 cpython 而不会导致崩溃？

Question

我有一个在无限循环中运行大量 urllib 请求的程序，这使我的程序非常慢，所以我尝试将它们作为线程。Urllib 在 socket 模块的深处使用 cpython，所以正在创建的线程只是加起来而不做任何事情，因为 python 的 GIL 阻止了两个 cpython 命令同时在不同的线程中执行。我正在运行带有 Python 2.5 的 Windows XP，所以我不能使用多进程模块。我尝试查看 subproccess 模块，看看是否有办法以某种方式在子进程中执行 python 代码，但没有。如果有人有办法通过多进程中的函数创建 python 子进程，那就太好了。

另外，我宁愿不下载外部模块，但我愿意。

编辑：这是我当前程序中一些代码的示例。

    url = "http://example.com/upload_image.php?username=Test&password=test"
    url = urllib.urlopen(url, data=urllib.urlencode({"Image": raw_image_data})).read()
    if url.strip().replace("\n", "") != "":
        print url

我做了一个测试，结果发现带有 Request 对象的 urllib2 的 urlopen 仍然是慢或慢。我创建了自己的自定义 timeit 类模块，上面大约需要 0.5-2 秒，这对于我的程序所做的事情来说太可怕了。

score 1 · Accepted Answer

Urllib 在 socket 模块的深处使用 cpython，因此正在创建的线程只是加起来而不做任何事情，因为 python 的 GIL 阻止了两个 cpython 命令同时在不同的线程中执行。

错误的。虽然这是一个常见的误解。CPython 可以并且确实发布了用于 IO 操作的 GIL（请Py_BEGIN_ALLOW_THREADS参阅中的所有内容socketmodule.c）。当一个线程等待 IO 完成时，其他线程可以做一些工作。如果urllib调用是脚本中的瓶颈，那么线程可能是可接受的解决方案之一。

我正在运行带有 Python 2.5 的 Windows XP，所以我不能使用多进程模块。

您可以安装 Python 2.6 或更新版本，或者如果您必须使用 Python 2.5；您可以单独安装多处理。

我创建了自己的自定义 timeit 类模块，上面大约需要 0.5-2 秒，这对于我的程序所做的事情来说太可怕了。

的性能urllib2.urlopen('http://example.com...).read()主要取决于外部因素，例如 DNS、网络延迟/带宽、example.com 服务器本身的性能。

threading这是一个同时使用and的示例脚本urllib2：

import urllib2
from Queue import Queue
from threading import Thread

def check(queue):
    """Check /n url."""
    opener = urllib2.build_opener() # if you use install_opener in other threads
    for n in iter(queue.get, None):
        try:
            data = opener.open('http://localhost:8888/%d' % (n,)).read()
        except IOError, e:
            print("error /%d reason %s" % (n, e))
        else:
            "check data here"

def main():
    nurls, nthreads = 10000, 10

    # spawn threads
    queue = Queue()
    threads = [Thread(target=check, args=(queue,)) for _ in xrange(nthreads)]
    for t in threads:
        t.daemon = True # die if program exits
        t.start()

    # provide some work
    for n in xrange(nurls): queue.put_nowait(n)
    # signal the end
    for _ in threads: queue.put(None)
    # wait for completion
    for t in threads: t.join()

if __name__=="__main__":
   main()

要将其转换为多处理脚本，只需使用不同的导入，您的程序将使用多个进程：

from multiprocessing import Queue
from multiprocessing import Process as Thread

# the rest of the script is the same

score 0 · Accepted Answer

如果您想要多线程，则可以选择 Jython，因为它没有 GIL。

我同意@Jan-Philip 和@Piotr。你用 urllib 做什么？

python - 有没有办法在不同的线程上运行 cpython 而不会导致崩溃？

2 回答 2

Related

Reference