django - 尝试使用 WeasyPrint 保存 PDF 字符串会导致 UnicodeDecodeError

Question

到目前为止，这是我的代码：

from django.template import (Context, Template) # v1.11
from weasyprint import HTML  # v0.42
import codecs

template = Template(codecs.open("/path/to/my/template.html", mode="r", encoding="utf-8").read())
context = Context({})
html = HTML(string=template.render(context))

pdf_file = html.write_pdf()

#with open("/path/to/my/file.pdf", "wb") as f:
#    f.write(self.pdf_file)

错误堆栈：

[17/Jan/2019 08:14:13] INFO [handle_correspondence:54] 'utf8' codec can't
decode byte 0xe2 in position 10: invalid continuation byte. You passed in
'%PDF-1.3\n%\xe2\xe3\xcf\xd3\n1 0 obj\n<</Author <> /Creator (cairo 1.14.6
(http://cairographics.org))\n  /Keywords <> /Producer (WeasyPrint 0.42.3
\\(http://weasyprint.org/\\))>>\nendobj\n2 0 obj\n<</Pages 3 0 R /Type
/Catalog>>\nendobj\n3 0 obj\n<</Count 1 /Kids [4 0 R] /Type
/Pages>>\nendobj\n4 0 obj\n<</BleedBox [0 0 595 841] /Contents 5 0 R
/Group\n  <</CS /DeviceRGB /I true /S /Transparency /Type /Group>>
MediaBox\n  [0 0 595 841] /Parent 3 0 R /Resources 6 0 R /TrimBox [0 0 595
841]\n  /Type /Page>>\nendobj\n5 0 obj\n<</Filter /FlateDecode /Length 15
0 R>>\nstream\nx\x9c+\xe4*T\xd0\x0fH,)I-\xcaSH.V\xd0/0U(N\xceS\xd0O4PH/\xe62P0P0\xb54U\xb001T(JUH\xe3\n\x04B\x00\x8bi\r\x89\nendstream\nendobj\n6 0
obj\n<</ExtGState <</a0 <</CA 1 /ca 1>>>> /Pattern <</p5 7 0
R>>>>\nendobj\n7 0 obj\n<</BBox [0 1123 794 2246] /Length 8 0 R /Matrix
[0.75 0 0 0.75 0 -843.5]\n  /PaintType 1 /PatternType 1 /Resources
<</XObject <</x7 9 0 R>>>>\n  /TilingType 1 /XStep 1588 /YStep
2246>>\nstream\n /x7 Do\n \n\nendstream\nendobj\n8 0 obj\n10\nendobj\n9 0
obj\n<</BBox [0 1123 794 2246] /Filter /FlateDecode /Length 10 0 R
/Resources\n  11 0 R /Subtype /Form /Type /XObject>>\nstream\nx\x9c+\xe4\nT(\xe42P0221S0\xb74\xd63\xb3\xb4T\xd05442\xd235R(JU\x08W\xc8\xe3*\xe42T0\x00B\x10\t\x942VH\xce\xe5\xd2O4PH/V\xd0\xaf04Tp\xc9\xe7\n\x04B\x00`\xf0\x10\x11\nendstream\nendobj\n10 0 obj\n77\nendobj\n11 0 obj\n<</ExtGState
<</a0 <</CA 1 /ca 1>>>> /XObject <</x11 12 0 R>>>>\nendobj\n12 0
obj\n<</BBox [0 1123 0 1123] /Filter /FlateDecode /Length 13 0 R
/Resources\n  14 0 R /Subtype /Form /Type /XObject>>\nstream\nx\x9c+\xe4\n
xe4\x02\x00\x02\x92\x00\xd7\nendstream\nendobj\n13 0 obj\n12\nendobj\n14 0
obj\n<<>>\nendobj\n15 0 obj\n58\nendobj\nxref\n0 16\n0000000000 65535
f\r\n0000000015 00000 n\r\n0000000168 00000 n\r\n0000000215 00000
n\r\n0000000270 00000 n\r\n0000000489 00000 n\r\n0000000620 00000
n\r\n0000000697 00000 n\r\n0000000923 00000 n\r\n0000000941 00000
n\r\n0000001165 00000 n\r\n0000001184 00000 n\r\n0000001264 00000
n\r\n0000001422 00000 n\r\n0000001441 00000 n\r\n0000001462 00000
n\r\ntrailer\n\n<</Info 1 0 R /Root 2 0 R /Size 16>>\nstartxref\n1481
n%%EOF\n' (<type 'str'>)

实际上它通过网络请求（返回 PDF 作为响应）和 shell（手动编写代码）工作。该代码经过测试，从未给我带来任何问题。文件以正确的编码保存，设置encodingkwargHTML无济于事；此外，mode模板的值是正确的，因为我见过其他问题可能是这个问题。

但是，我正在添加一个管理命令以定期使用它（对于较大的 PDF，我无法通过 Web 请求执行此操作，因为服务器的超时可能会在完成之前激活），当我尝试调用它时，我只会得到一个UnicodeDecodeError说法'utf8' codec can't decode byte 0xe2 in position 10: invalid continuation byte。

PDF（至少从我所见）最初使用以下字符呈现：

%PDF-1.3\n%\xe2\xe3\xcf\xd3\n1 0

这转化为：

%PDF-1.3
%âãÏÓ
1 0 obj

所以问题都是关于性格的â。但这是一个陷阱！

相反，问题在于这行代码：

pdf_file = html.write_pdf()

将其更改为：

html.write_pdf()

只是按预期工作！

所以我的问题是：当尝试将变量分配给字符串时，Python 可能存在什么类型的原因抛出？UnicodeDecodeError我已经在我的 virtualenv 中深入研究了 weasyprint 的代码，但我没有看到那里的转换。

score 0 · Accepted Answer

所以我不知道为什么，但现在突然它起作用了。我实际上并没有修改任何东西：我只是再次运行该命令，它就可以工作了。

~~我没有将问题标记为已回答，因为将来有人可能会遇到与我相同的问题，可以尝试发布正确的问题。~~

如此令人不安。

编辑

所以看起来我是一个非常聪明的人，他试图为创建的 PDF 的内容而不是文件本身设置的值self.pdf_file，即 a 。models.FileField

django - 尝试使用 Wea​​syPrint 保存 PDF 字符串会导致 UnicodeDecodeError

1 回答 1

Related

Reference

django - 尝试使用 WeasyPrint 保存 PDF 字符串会导致 UnicodeDecodeError