python - 在 Python 中使用 xhtml2pdf.pisa 转换阿拉伯页面

Question

我正在尝试从 pisa 实用程序转换 html2pdf。请检查下面的代码。我遇到了我无法弄清楚的错误。

Traceback (most recent call last):
  File "dewa.py", line 27, in <module>
    html = html.encode(enc, 'replace')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 203: ordinal not in range(128)

请在此处检查代码。

from cStringIO import StringIO
from grab import Grab
from grab.tools.lxml_tools import drop_node, render_html
from grab.tools.text import remove_bom
from lxml import etree
import grab.error
import inspect
import lxml
import os
import sys
import xhtml2pdf.pisa as pisa

enc = 'utf-8'
filePath = '~/Desktop/dewa'
##############################

g = Grab()
g.go('http://www.dewa.gov.ae/arabic/aboutus/dewahistory.aspx')

html = g.response.body

html = html.replace('bgcolor="EDF389"', 'bgcolor="#EDF389"')


''' clear page '''
html = html.encode(enc, 'replace')

print html

f = file(filePath + '.html' , 'wb')
f.write(html)
f.flush()
f.close()

''' Save PDF '''
pdfresult = StringIO()
pdf = pisa.pisaDocument(StringIO(html), pdfresult, encoding = enc)
f = file(filePath + '.pdf', 'wb')
f.write(pdfresult.getvalue())
f.flush()
f.close()
pdfresult.close()

score 2 · Accepted Answer

如果您检查此行返回的对象类型：

html = g.response.body

你会看到它不是一个 unicode 对象：

print type(html)
...
<type 'str'>

所以当你来到这条线时：

html = html.encode(enc, 'replace')

您正在尝试重新编码已经编码的字符串（这会导致错误）。

要解决此问题，请将您的代码更改为如下所示：

# decode the dowloaded data
html = g.response.body.decode(enc)

# html is now a unicode object
html = html.replace('bgcolor="EDF389"', 'bgcolor="#EDF389"')

print html

# encode as utf-8 before writing to file (no need for 'replace')
html = html.encode(enc)

python - 在 Python 中使用 xhtml2pdf.pisa 转换阿拉伯页面

1 回答 1

Related

Reference