python - Boto S3 偶尔会抛出 httplib.IncompleteRead

Question

我有几个使用 boto 从 Amazon S3 读取许多文件的守护程序。每隔几天，我就会遇到从 boto 深处抛出 httplib.IncompleteRead 的情况。如果我尝试并重试该请求，它会立即因另一个 IncompleteRead 而失败。即使我打电话bucket.connection.close()，所有进一步的请求仍然会出错。

我觉得我可能在这里偶然发现了 boto 中的一个错误，但似乎没有其他人遇到过它。难道我做错了什么？所有的守护进程都是单线程的，我尝试过is_secure两种方式。

Traceback (most recent call last):
  ...
  File "<file_wrapper.py",> line 22, in next
    line = self.readline()
  File "<file_wrapper.py",> line 37, in readline
    data = self.fh.read(self.buffer_size)
  File "<virtualenv/lib/python2.6/site-packages/boto/s3/key.py",> line 378, in read
    self.close()
  File "<virtualenv/lib/python2.6/site-packages/boto/s3/key.py",> line 349, in close
    self.resp.read()
  File "<virtualenv/lib/python2.6/site-packages/boto/connection.py",> line 411, in read
    self._cached_response = httplib.HTTPResponse.read(self)
  File "/usr/lib/python2.6/httplib.py", line 529, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python2.6/httplib.py", line 621, in _safe_read
    raise IncompleteRead(''.join(s), amt)

环境：

亚马逊 EC2
Ubuntu 11.10
Python 2.6.7
博托 2.12.0

score 4 · Accepted Answer

这很可能是boto中的一个错误，但您描述的症状并不是它独有的。看

使用 httplib 读取不完整

https://dev.twitter.com/discussions/9554

由于 httplib 出现在您的回溯中，因此这里提出了一种解决方案：

http://bobrochel.blogspot.in/2010/11/bad-servers-chunked-encoding-and.html?showComment=1358777800048

免责声明：我没有使用boto的经验。这仅基于研究并发布，因为没有其他回应。

score 4 · Accepted Answer

我一直在努力解决这个问题，运行从 S3 读取大量数据的长时间运行的进程。为了后代，我决定在这里发布我的解决方案。

首先，我确信@Glenn 所指的黑客是有效的，但我选择不使用它，因为我认为它具有侵入性（黑客攻击 httplib）和不安全（它盲目地返回它所得到的，即return e.partial，尽管它可以是真正的错误情况）。

这是我最终想出的解决方案，似乎可行。

我正在使用这个通用重试功能：

import time, logging, httplib, socket

def run_with_retries(func, num_retries, sleep = None, exception_types = Exception, on_retry = None):
    for i in range(num_retries):
        try:
            return func()  # call the function
        except exception_types, e:
            # failed on the known exception
            if i == num_retries - 1:
                raise  # this was the last attempt. reraise
            logging.warning(f'operation {func} failed with error {e}. will retry {num_retries-i-1} more times')
            if on_retry is not None:
                on_retry()
            if sleep is not None:
                time.sleep(sleep)
    assert 0  # should not reach this point

现在，当从 S3 读取文件时，我正在使用这个函数，它在内部执行重试以防IncompleteRead出错。出现错误时，在重试之前，我调用key.close().

def read_s3_file(key):
    """
    Reads the entire contents of a file on S3.
    @param key: a boto.s3.key.Key instance
    """
    return run_with_retries(
        key.read, num_retries = 3, sleep = 0.5,
        exception_types = (httplib.IncompleteRead, socket.error),
        # close the connection before retrying
        on_retry = lambda: key.close()
    )

score 0 · Accepted Answer

如果您从 S3 读取大量数据，您可能需要对您的读/写进行分块/多部分。

这里有一个很好的例子来做多部分（http://www.bogotobogo.com/DevOps/AWS/aws_S3_uploading_large_file.php）

python - Boto S3 偶尔会抛出 httplib.IncompleteRead

3 回答 3

Related

Reference