python - 将文本编码为 html 实体（不是标签）

Question

我一直在寻找这个没有任何运气。所以我想问题可能是因为我错过了一些概念或者不明白我真正需要什么，所以问题就在这里：

我正在使用 pisa 创建一个 pdf，这是我用于它的代码：

def write_to_pdf(template_data, context_dict, filename):
    template = Template(template_data)
    context = Context(context_dict)
    html = template.render(context)
    result = StringIO.StringIO()
    pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), result, link_callback=fetch_resources)

    if not pdf.err:
        response = http.HttpResponse(mimetype='application/pdf')
        response['Content-Disposition'] = 'attachment; filename=%s.pdf' % filename
        response.write(result.getvalue())
        return response

    return http.HttpResponse('Problem creating PDF: %s' % cgi.escape(html))

因此，如果我尝试将此字符串变为 pdf：

template_data = '测试 á'

它变成了这样的东西（考虑#是一个黑点而不是字母）：

t##sting á

我尝试使用cgi.escape没有任何运气，因为黑点仍然存在并且它最终会打印 html 标签。它是 python 2.7，所以我不能使用html.escape和解决我所有的问题。

所以我需要一些可以将普通文本转换为 html 实体而不影响已经存在的 html 标签的东西。有什么线索吗？

哦，如果我改变那行：

pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), result, link_callback=fetch_resources)

到

pdf = pisa.pisaDocument(html, result, link_callback=fetch_resources)

它可以工作，但它不会创建我需要的 html 实体，因为我不确切知道将在那里放置什么样的字符，并且可能不会得到 pisa 的支持。

score 2 · Accepted Answer

使用 Python 对命名的 HTML 实体进行编码

http://beckism.com/2009/03/named_entities_python/

还有一个用于解码和编码的 django 应用程序：

https://github.com/cobrateam/python-htmlentities

html.entities.codepoint2name对于 Python 2.x（在 Python 3.x 中更改为）：

'''
Registers a special handler for named HTML entities

Usage:
import named_entities
text = u'Some string with Unicode characters'
text = text.encode('ascii', 'named_entities')
'''

import codecs
from htmlentitydefs import codepoint2name

def named_entities(text):
    if isinstance(text, (UnicodeEncodeError, UnicodeTranslateError)):
        s = []
        for c in text.object[text.start:text.end]:
            if ord(c) in codepoint2name:
                s.append(u'&%s;' % codepoint2name[ord(c)])
            else:
                s.append(u'&#%s;' % ord(c))
        return ''.join(s), text.end
    else:
        raise TypeError("Can't handle %s" % text.__name__)

codecs.register_error('named_entities', named_entities)

python - 将文本编码为 html 实体（不是标签）

1 回答 1

Related

Reference