python - 我是否正确解析了这个 HTTP POST 请求？

Question

首先让我说，我正在使用twisted.web框架。Twisted.web的文件上传不像我想要的那样工作（它只包括文件数据，而不是任何其他信息），cgi.parse_multipart不像我想要的那样工作（同样的事情，twisted.web使用这个功能），cgi.FieldStorage没有工作（因为我通过扭曲而不是 CGI 接口获取 POST 数据——据我所知，FieldStorage尝试通过标准输入获取请求），并且twisted.web2对我不起作用，因为使用Deferred混淆和激怒了我（对于我想要的来说太复杂了）。

话虽如此，我决定自己尝试解析 HTTP 请求。

使用 Chrome，HTTP 请求是这样形成的：

------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="upload_file_nonce"

11b03b61-9252-11df-a357-00266c608adb
------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="file"; filename="login.html"
Content-Type: text/html

<!DOCTYPE html>
<html>
  <head> 

...

------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="file"; filename=""


------WebKitFormBoundary7fouZ8mEjlCe92pq--

总是这样形成吗？我正在用正则表达式解析它，就像这样（请原谅代码墙）：

（注意，我剪掉了大部分代码以仅显示我认为相关的内容（正则表达式（是的，嵌套括号），这是我构建__init__的类中的一个方法（迄今为止唯一的方法）Uploads。完整的代码可以在修订历史中可以看到（我希望我没有不匹配任何括号）

if line == "--{0}--".format(boundary):
    finished = True

if in_header == True and not line:
    in_header = False
    if 'type' not in current_file:
        ignore_current_file = True

if in_header == True:
    m = re.match(
        "Content-Disposition: form-data; name=\"(.*?)\"; filename=\"(.*?)\"$", line)
    if m:
        input_name, current_file['filename'] = m.group(1), m.group(2)

    m = re.match("Content-Type: (.*)$", line)
    if m:
        current_file['type'] = m.group(1)

    else:
        if 'data' not in current_file:
            current_file['data'] = line
        else:
            current_file['data'] += line

您可以看到，每当达到边界时，我都会启动一个新的“文件”字典。我开始in_header说True我正在解析标题。当我到达一个空行时，我将其切换到False- 但不是在检查是否Content-Type为该表单值设置了 a 之前 - 如果没有，我设置，ignore_current_file因为我只是在寻找文件上传。

我知道我应该使用一个库，但是我厌倦了阅读文档，试图在我的项目中获得不同的解决方案，并且仍然让代码看起来合理。我只是想跳过这一部分——如果用文件上传来解析 HTTP POST 是这么简单，那么我会坚持下去。

注意：这段代码现在完美运行，我只是想知道它是否会阻塞/吐出来自某些浏览器的请求。

score 7 · Accepted Answer

我对这个问题的解决方案是使用 cgi.FieldStorage 解析内容，例如：

class Root(Resource):

def render_POST(self, request):

    self.headers = request.getAllHeaders()
    # For the parsing part look at [PyMOTW by Doug Hellmann][1]
    img = cgi.FieldStorage(
        fp = request.content,
        headers = self.headers,
        environ = {'REQUEST_METHOD':'POST',
                 'CONTENT_TYPE': self.headers['content-type'],
                 }
    )

    print img["upl_file"].name, img["upl_file"].filename,
    print img["upl_file"].type, img["upl_file"].type
    out = open(img["upl_file"].filename, 'wb')
    out.write(img["upl_file"].value)
    out.close()
    request.redirect('/tests')
    return ''

score 1 · Accepted Answer

content-disposition 标头没有定义字段的顺序，而且它可能包含比文件名更多的字段。所以你的文件名匹配可能会失败 - 甚至可能没有文件名！

见rfc2183（编辑邮件，见rfc1806，rfc2616，也许更多的http）

我还建议在这些正则表达式中用 \s* 替换每个空格，而不是依赖字符大小写。

score 1 · Accepted Answer

您试图避免阅读文档，但我认为最好的建议是实际阅读：

rfc 2388从表单返回值：multipart/form-data
rfc 1867基于表单的 HTML 文件上传

以确保您不会错过任何案例。更简单的方法可能是使用海报库。

python - 我是否正确解析了这个 HTTP POST 请求？

3 回答 3

Related

Reference