python - 使用 Python“请求”库读取流式 http 响应

Question

我正在尝试使用该模块使用Kubernetes api提供的事件流。requests我遇到了一个看起来像缓冲的问题：requests模块似乎滞后了一个事件。

我的代码看起来像这样：

r = requests.get('http://localhost:8080/api/v1beta1/watch/services',
                 stream=True)

for line in r.iter_lines():
    print 'LINE:', line

由于 Kubernetes 发出事件通知，这段代码只会在有新事件出现时显示最后发出的事件，这使得它对于需要响应服务添加/删除事件的代码几乎完全无用。

我通过curl在子进程中生成而不是使用requests库解决了这个问题：

p = subprocess.Popen(['curl', '-sfN',
                      'http://localhost:8080/api/watch/services'],
                     stdout=subprocess.PIPE,
                     bufsize=1)

for line in iter(p.stdout.readline, b''):
    print 'LINE:', line

这可行，但以牺牲一些灵活性为代价。有没有办法避免这个requests库的缓冲问题？

score 9 · Accepted Answer

This behavior is due to a buggy implementation of the iter_lines method in the requests library.

iter_lines iterates over the response content in chunk_size blocks of data using the iter_content iterator. If there are less than chunk_size bytes of data available for reading from the remote server (which will typically be the case when reading the last line of output), the read operation will block until chunk_size bytes of data are available.

I have written my own iter_lines routine that operates correctly:

import os


def iter_lines(fd, chunk_size=1024):
    '''Iterates over the content of a file-like object line-by-line.'''

    pending = None

    while True:
        chunk = os.read(fd.fileno(), chunk_size)
        if not chunk:
            break

        if pending is not None:
            chunk = pending + chunk
            pending = None

        lines = chunk.splitlines()

        if lines and lines[-1]:
            pending = lines.pop()

        for line in lines:
            yield line

    if pending:
        yield(pending)

This works because os.read will return less than chunk_size bytes of data rather than waiting for a buffer to fill.

python - 使用 Python“请求”库读取流式 http 响应

1 回答 1

Related

Reference