python - 从 ThreadPoolExecutor 调用时对 urllib.request.urlopen 的异常调用会留下打开的文件描述符

Question

我正在尝试使用 Yahoo Finance 的多个线程下载大量数据。我concurrent.futures.ThreadPoolExecutor用来加快速度。一切都很顺利，直到我使用了所有可用的文件描述符（默认为 1024）。

当urllib.request.urlopen()引发异常时，文件描述符保持打开状态（无论我使用的套接字超时）。通常，如果我仅从单个（主）线程运行内容，则会重用此文件描述符，因此不会发生此问题。但是，当这些异常urlopen()调用从ThreadPoolExecutor线程进行时，这些文件描述符保持打开状态。到目前为止，我想出的唯一解决方案是使用ProcessPoolExecutor非常繁琐且效率低下的进程（）。必须有更聪明的方法来处理这个问题。

而且我想知道这是否是 Python 库中的一个错误，或者我只是做错了什么......

我在 Debian 上运行 Python 3.4.1（测试，内核 3.10-3-amd64）。

这是演示此行为的示例代码：

import concurrent
import concurrent.futures
import urllib.request
import os
import psutil
from time import sleep


def fetchfun(url):
    urllib.request.urlopen(url)


def main():

    print(os.getpid())
    p = psutil.Process(os.getpid())
    print(p.get_num_fds())


    # this url doesn't exist
    test_url = 'http://ichart.finance.yahoo.com/table.csv?s=YHOOxyz' + \
            '&a=00&b=01&c=1900&d=11&e=31&f=2019&g=d'

    with concurrent.futures.ThreadPoolExecutor(1) as executor:
        futures = []
        for i in range(100):
            futures.append(executor.submit(fetchfun, test_url))
        count = 0
        for future in concurrent.futures.as_completed(futures):
            count += 1
            print("{}: {} (ex: {})".format(count, p.get_num_fds(), future.exception()))

    print(os.getpid())
    sleep(60)


if __name__ == "__main__":
    main()

score 3 · Accepted Answer

引发时，它将请求对象HTTPError的引用保存为的属性。该引用将保存在您的列表中，直到您的程序结束才被销毁。这意味着对整个程序保持活动状态的引用。只要该引用存在，在该引用中使用的套接字就保持打开状态。解决此问题的一种方法是在处理异常时显式关闭：HTTPResponsefpHTTPErrorfuturesHTTPResponseHTTPResponseHTTPResponse

with concurrent.futures.ThreadPoolExecutor(1) as executor:
    futures = []
    for i in range(100):
        futures.append(executor.submit(fetchfun, test_url))
    count = 0
    for future in concurrent.futures.as_completed(futures):
        count += 1
        exc = future.exception()
        print("{}: {} (ex: {})".format(count, p.get_num_fds(), exc))
        exc.fp.close()  # Close the HTTPResponse

python - 从 ThreadPoolExecutor 调用时对 urllib.request.urlopen 的异常调用会留下打开的文件描述符

1 回答 1

Related

Reference