1

我为我当前的项目编写了一个自定义的 Django 文件上传处理程序。这是一个概念验证,它允许您计算上传文件的哈希值,而无需将该文件存储在磁盘上。可以肯定的是,这是一个概念证明,但如果我能让它发挥作用,我就能达到我工作的真正目的。

从本质上讲,这就是我到目前为止所拥有的,除了一个主要例外情况外,它运行良好:

from django.core.files.uploadhandler import *
from hashlib import sha256
from myproject.upload.files import MyProjectUploadedFile

class MyProjectUploadHandler(FileUploadHandler):
    def __init__(self, *args, **kwargs):
        super(MyProjectUploadHandler, self).__init__(*args, **kwargs)

    def handle_raw_input(self, input_data, META, content_length, boundary,
            encoding = None):
        self.activated = True

    def new_file(self, *args, **kwargs):
        super(MyProjectUploadHandler, self).new_file(*args, **kwargs)

        self.digester = sha256()
        raise StopFutureHandlers()

    def receive_data_chunk(self, raw_data, start):
        self.digester.update(raw_data)

    def file_complete(self, file_size):
        return MyProjectUploadedFile(self.digester.hexdigest())

自定义上传处理程序效果很好。哈希是准确的,无需将任何上传的文件存储到磁盘即可工作,并且每次仅使用 64kb 的内存。

我遇到的唯一问题是我需要在处理文件之前从 POST 请求访问另一个字段,即用户输入的文本盐。我的表格如下所示:

<form id="myForm" method="POST" enctype="multipart/form-data" action="/upload/">
    <fieldset>
        <input name="salt" type="text" placeholder="Salt">
        <input name="uploadfile" type="file">
        <input type="submit">
    </fieldset>
</form>

“盐” POST 变量仅在处理请求并上传文件后才可供我使用,这不适用于我的用例。我似乎无法在我的上传处理程序中找到以任何方式、形状或形式访问此变量的方法。

有没有办法让我在遇到每个多部分变量时访问它,而不仅仅是访问上传的文件?

4

2 回答 2

2

我的解决方案并不容易,但它是:

class IntelligentUploadHandler(FileUploadHandler):
    """
    An upload handler which overrides the default multipart parser to allow
    simultaneous parsing of fields and files... intelligently. Subclass this
    for real and true awesomeness.
    """

    def __init__(self, *args, **kwargs):
        super(IntelligentUploadHandler, self).__init__(*args, **kwargs)

    def field_parsed(self, field_name, field_value):
        """
        A callback method triggered when a non-file field has been parsed 
        successfully by the parser. Use this to listen for new fields being
        parsed.
        """
        pass

    def handle_raw_input(self, input_data, META, content_length, boundary,
            encoding = None):
        """
        Parse the raw input from the HTTP request and split items into fields
        and files, executing callback methods as necessary.

        Shamelessly adapted and borrowed from django.http.multiparser.MultiPartParser.
        """
        # following suit from the source class, this is imported here to avoid
        # a potential circular import
        from django.http import QueryDict

        # create return values
        self.POST = QueryDict('', mutable=True)
        self.FILES = MultiValueDict()

        # initialize the parser and stream
        stream = LazyStream(ChunkIter(input_data, self.chunk_size))

        # whether or not to signal a file-completion at the beginning of the loop.
        old_field_name = None
        counter = 0

        try:
            for item_type, meta_data, field_stream in Parser(stream, boundary):
                if old_field_name:
                    # we run this test at the beginning of the next loop since
                    # we cannot be sure a file is complete until we hit the next
                    # boundary/part of the multipart content.
                    file_obj = self.file_complete(counter)

                    if file_obj:
                        # if we return a file object, add it to the files dict
                        self.FILES.appendlist(force_text(old_field_name, encoding,
                            errors='replace'), file_obj)

                    # wipe it out to prevent havoc
                    old_field_name = None
                try: 
                    disposition = meta_data['content-disposition'][1]
                    field_name = disposition['name'].strip()
                except (KeyError, IndexError, AttributeError):
                    continue

                transfer_encoding = meta_data.get('content-transfer-encoding')

                if transfer_encoding is not None:
                    transfer_encoding = transfer_encoding[0].strip()

                field_name = force_text(field_name, encoding, errors='replace')

                if item_type == FIELD:
                    # this is a POST field
                    if transfer_encoding == "base64":
                        raw_data = field_stream.read()
                        try:
                            data = str(raw_data).decode('base64')
                        except:
                            data = raw_data
                    else:
                        data = field_stream.read()

                    self.POST.appendlist(field_name, force_text(data, encoding,
                        errors='replace'))

                    # trigger listener
                    self.field_parsed(field_name, self.POST.get(field_name))
                elif item_type == FILE:
                    # this is a file
                    file_name = disposition.get('filename')

                    if not file_name:
                        continue

                    # transform the file name
                    file_name = force_text(file_name, encoding, errors='replace')
                    file_name = self.IE_sanitize(unescape_entities(file_name))

                    content_type = meta_data.get('content-type', ('',))[0].strip()

                    try:
                        charset = meta_data.get('content-type', (0, {}))[1].get('charset', None)
                    except:
                        charset = None

                    try:
                        file_content_length = int(meta_data.get('content-length')[0])
                    except (IndexError, TypeError, ValueError):
                        file_content_length = None

                    counter = 0

                    # now, do the important file stuff
                    try:
                        # alert on the new file
                        self.new_file(field_name, file_name, content_type,
                                file_content_length, charset)

                        # chubber-chunk it
                        for chunk in field_stream:
                            if transfer_encoding == "base64":
                                # base 64 decode it if need be
                                over_bytes = len(chunk) % 4

                                if over_bytes:
                                    over_chunk = field_stream.read(4 - over_bytes)
                                    chunk += over_chunk

                                try:
                                    chunk = base64.b64decode(chunk)
                                except Exception as e:
                                    # since this is anly a chunk, any error is an unfixable error
                                    raise MultiPartParserError("Could not decode base64 data: %r" % e)

                            chunk_length = len(chunk)
                            self.receive_data_chunk(chunk, counter)
                            counter += chunk_length
                            # ... and we're done
                    except SkipFile:
                        # just eat the rest
                        exhaust(field_stream)
                    else:
                        # handle file upload completions on next iteration
                        old_field_name = field_name

        except StopUpload as e:
            # if we get a request to stop the upload, exhaust it if no con reset
            if not e.connection_reset:
                exhaust(input_data)
        else:
            # make sure that the request data is all fed
            exhaust(input_data)

        # signal the upload has been completed
        self.upload_complete()

        return self.POST, self.FILES

    def IE_sanitize(self, filename):
        """Cleanup filename from Internet Explorer full paths."""
        return filename and filename[filename.rfind("\\")+1:].strip()

本质上,通过继承这个类,你可以拥有一个更...智能的上传处理程序。field_parsed根据我的目的,将使用子类的方法宣布字段。

我已将此作为功能请求报告给 Django 团队,希望此功能成为 Django 常规工具箱的一部分,而不是像上面那样对源代码进行猴子修补。

于 2013-03-13T05:50:33.820 回答
0

根据FileUploadHandler第 62 行的代码:

https://github.com/django/django/blob/master/django/core/files/uploadhandler.py

看起来请求对象被传递到处理程序并存储为self.request

在这种情况下,您应该能够在上传处理程序中的任何时候访问盐,方法是

salt = self.request.POST.get('salt')

除非我误解了你的问题。

于 2013-03-13T00:03:44.230 回答