python - 优化 Django / Python 中的 PDF 转换

Question

我有一个以 PDF 格式导出报告的 web 应用程序。当查询返回少于 100 个值时，一切都很好。当记录数超过 100 时，服务器会引发 502 代理错误。该报告以 HTML 格式输出。挂掉服务器的过程是从html到PDF的转换。我正在使用xhtml2pdf (AKA pisa 3.0)来生成 PDF。算法是这样的：

def view1(request, **someargs):
    queryset = someModel.objects.get(someargs)
    if request.GET['pdf']:
        return pdfWrapper('template.html',queryset,'filename')
    else:
        return render_to_response('template.html',queryset)

def pdfWrapper(template_src, context_dict, filename):
    ################################################
    #
    # The code comented below is an older version
    # I updated the code according the comment recived
    # The function still works for short HTML documents
    # and produce the 502 for larger onese
    #
    ################################################

    ##import cStringIO as StringIO
    import ho.pisa as pisa
    from django.template.loader import get_template
    from django.template import Context
    from django.http import HttpResponse
    ##from cgi import escape

    template = get_template(template_src)
    context = Context(context_dict)
    html  = template.render(context)

    response = HttpResponse()
    response['Content-Type'] ='application/pdf'
    response['Content-Disposition']='attachment; filename=%s.pdf'%(filename)

    pisa.CreatePDF(
        src=html,
        dest=response,
        show_error_as_pdf=True)

    return response

    ##result = StringIO.StringIO()
    ##pdf = pisa.pisaDocument(
    ##            StringIO.StringIO(html.encode("ISO-8859-1")),
    ##            result)
    ##if not pdf.err:
    ##    response = HttpResponse(
    ##                   result.getvalue(), 
    ##                   mimetype='application/pdf')
    ##    response['Content-Disposition']='attachement; filename=%s.pdf'%(filename)
    ##    return response
    ##return HttpResponse('Hubo un error<pre>%s</pre>' % escape(html))

我已经考虑过创建一个缓冲区，以便服务器可以释放一些内存，但我还没有找到任何东西。任何人都可以帮忙吗？请？

score 3 · Accepted Answer

我不能确切地告诉你是什么导致了你的问题——它可能是由 StringIO 中的缓冲问题引起的。

但是，如果您假设此代码实际上会流式传输生成的 PDF 数据，那么您就错了： StringIO.getvalue() 在调用此方法时返回字符串缓冲区的内容，而不是输出流（参见http://docs .python.org/library/stringio.html#StringIO.StringIO.getvalue）。

如果要流式传输输出，可以将 HttpResponse 实例视为类似文件的对象（请参阅http://docs.djangoproject.com/en/1.2/ref/request-response/#usage）。

其次，我看不出有任何理由在这里使用 StringIO。根据我发现的比萨文档（顺便说一下，它调用了这个函数 CreatePDF）源可以是字符串或 unicode 对象。

就个人而言，我会尝试以下方法：

将 HTML 创建为 unicode 字符串
创建和配置 HttpResponse 对象
使用字符串作为输入和响应作为输出调用 PDF 生成器

概括地说，这可能如下所示：

html = template.render(context)

response = HttpResponse()
response['Content-Type'] ='application/pdf'
response['Content-Disposition']='attachment; filename=%s.pdf'%(filename)

pisa.CreatePDF(
    src=html,
    dest=response,
    show_error_as_pdf=True)

#response.flush()
return response

但是，我没有尝试这是否真的有效。（到目前为止，我只在 Java 中进行了这种 PDF 流式传输。）

更新：我刚刚看了 HttpResponse 的实现。它通过将写入它的字符串块收集到一个列表中来实现文件接口。调用 response.flush() 毫无意义，因为它什么都不做。此外，即使响应已作为文件对象访问，您也可以设置响应参数，如 Content-Type。

您最初的问题也可能与您从未关闭过 StringIO 对象的事实有关。在调用 close() 之前，不会释放 StringIO 对象的底层缓冲区。

python - 优化 Django / Python 中的 PDF 转换

1 回答 1

Related

Reference