0

我今天遇到了一个问题,我正在使用海报模块上传一个带有 http POST 的多部分表单。

表格的一部分是一个文件,海报流出来 - 这很棒。

我遇到的问题是Content-Length在上传开始之前预先计算出来,但是因为表单数据是动态生成的,所以上传的数据量完全有可能最终会有所不同(如果文件在表单在上传过程中被外部东西修改)。

如果文件变长,则服务器将在我完成之前收到内容长度中指定的数据量时关闭连接,并且出现Connection reset by peer错误。如果文件变短,则上传挂起,服务器正在等待我承诺的其余字节。

在后一种情况下,当我中断挂起的上传时,我会得到这个堆栈跟踪:

Traceback (most recent call last):
  File "/Users/paul/Source/Python/test_uploader.py", line 35, in <module>
    gUpload(target_file, size, result.signed, callback, md5=md5)
  File "/Users/paul/Source/Python/PythonApp/upload.py", line 597, in handlingHttpError
    return func(*args, **kwargs)
  File "/Users/paul/Source/Python/PythonApp/upload.py", line 663, in gUpload
    urllib2.urlopen(request)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/poster-0.8.1-py2.7.egg/poster/streaminghttp.py", line 142, in http_open
    return self.do_open(StreamingHTTPConnection, req)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1180, in do_open
    r = h.getresponse(buffering=True)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1030, in getresponse
    response.begin()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 447, in readline
    data = self._sock.recv(self._rbufsize)
KeyboardInterrupt

我该如何处理这种情况?我不介意它抛出一个错误,但这个挂起正在杀死我!

4

1 回答 1

1

感谢您的建议,但是我无法锁定任何文件,因为我的进程几乎总是比可能正在编辑我正在上传的文件的进程的优先级低。

这就是我最终所追求的,它似乎运作良好!

class SizeCheckFile(file):
    def __init__(self, size, *args, **kwargs):
        file.__init__(self, *args, **kwargs)
        self.size = size
        self.data_read = 0

    def read(self, *args, **kwargs):
        data = file.read(self, *args, **kwargs)
        self.data_read += len(data)
        if self.data_read > self.size:
            raise UploadSizeMismatchError("File has grown!")
        elif not data and self.data_read != self.size:
            raise UploadSizeMismatchError("File has shrunk!")
        return data

    def seek(self, *args, **kwargs):
        current_pos = self.tell()
        file.seek(self, *args, **kwargs)
        if current_pos != self.tell():
            raise NotImplementedError("%s currently assumes the file is being read from start to finish!" % self.__class__.__name__)

我传递给构造函数的大小与我传递给海报的MultipartParam filesize参数大小相同。

当然,这假设没有进行搜索,否则我将不得不覆盖seek并准确跟踪正在读取的内容,但对于我的用例,我不必担心文件正在流出。

于 2013-02-27T19:43:01.117 回答