python - GAE Python：将 UTF-8 字符从 XML 文件导入数据库模型

Question

我正在从在线资源解析 XML 文件，但在读取 utf-8 字符时遇到问题。现在我已经阅读了其他一些处理类似问题的问题，但是到目前为止没有一个解决方案有效。目前代码如下所示。

class XMLParser(webapp2.RequestHandler):

def get(self):

        url = fetch('some.xml.online')

        xml = parseString(url.content)

        vouchers = xml.getElementsByTagName("VoucherCode")

        for voucher in vouchers:

          if voucher.getElementsByTagName("ActivePartnership")[0].firstChild.data == "true":

            coupon = Coupon()
            coupon.description = str(voucher.getElementsByTagName("Description")[0].firstChild.data.decode('utf-8'))
            coupon.prov_key = str(voucher.getElementsByTagName("Id")[0].firstChild.data)
            coupon.put()
            self.redirect('/admin/coupon')

我从中得到的错误如下所示。这是由描述字段中的“ü”引起的，我稍后在使用数据时也需要显示。

文件“C:\Users\Vincent\Documents\www\Sparkompass\Website\main.py”，第 217 行，在 get coupon.description = str(voucher.getElementsByTagName("Description")[0].firstChild.data.decode ('utf-8')) 文件“C:\Python27\lib\encodings\utf_8.py”，第 16 行，解码返回 codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode位置 16 中的字符 u'\xfc'：序数不在范围内（128）

如果我去掉描述，一切都会正常工作。在数据库模型定义中，我定义了如下描述：

description = db.StringProperty(multiline=True)

尝试 2

我也试过这样做：

coupon.description = str(voucher.getElementsByTagName("Description")[0].firstChild.data).decode('utf-8')

这也给了我：

UnicodeEncodeError：'ascii' 编解码器无法在位置 16 编码字符 u'\xfc'：序数不在范围内（128）

任何帮助将不胜感激！

更新

XML 文件包含德语，这意味着其中的更多字符是 UTF-8 字符。因此，理想情况下，我现在认为在更高级别进行解码可能会更好，例如

xml = parseString(url.content)

但是到目前为止，我也没有让它发挥作用。目的是获取 ascii 中的字符，因为这是 GAE 需要将其注册为数据库模型中的字符串。

score 1 · Accepted Answer

>>> u"ü".decode("utf-8")

UnicodeEncodeError

>>> u"ü".encode("utf-8")

'\xc3\xbc'

>>> u"ü".encode("utf-8").decode("utf-8")

你'\xfc'

>>> str(u"ü".encode("utf-8").decode("utf-8"))

UnicodeEncodeError

>>> str(u"ü".encode("utf-8"))

'\xc3\xbc'

您需要哪种编码？

您还可以使用：

string2 = cgi.escape(string).encode("latin-1", "xmlcharrefreplace")

这会将所有非 latin-1 字符替换为 xml 实体。

score 0 · Accepted Answer

我现在通过将描述更改为 TextProperty 解决了这个问题，它没有给出任何错误。我知道这样做时我将无法进行排序或过滤，但对于描述，这应该没问题。

背景信息：https ://developers.google.com/appengine/docs/python/datastore/typesandpropertyclasses#TextProperty

python - GAE Python：将 UTF-8 字符从 XML 文件导入数据库模型

2 回答 2

Related

Reference