python - 在 gspread 包装器中使用 unicode 函数时出错。潜在和错误

Question

当使用带有以下字符串的 unicode 函数时，会出现错误：

unicode('All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 68: ordinal not in range(128)

当我检查位置 68 时，它似乎是撇号'：

>>> str='All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> str[62:75]
' haven\xe2\x80\x99t us'

有没有办法处理这个问题。我在第 426 行的文件 models.py 中的 gspread 包装器中发现了这个错误。这是行：

425 cell_elem = feed.find(_ns1('cell'))
426 cell_elem.set('inputValue', unicode(val))
427 uri = self._get_link('edit', feed).get('href')

因此，一旦我尝试使用值更新单元格，在这种情况下为字符串，gspread 包装器会尝试将其转换为 unicode，但由于撇号而无法这样做。潜在地，这是一个错误。如何处理这个问题？谢谢您的帮助。

score 0 · Accepted Answer

没有必要更换字符。只需将编码的字符串正确解码为 unicode：

>>> s = 'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> s.decode('utf-8')
u'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven\u2019t used it for a month so I\u2019ll check on this.'  # unicode object

您需要告诉 python 您的str对象使用什么编码才能将其转换为 unicode，而不仅仅是unicode(some_str)直接使用。在这种情况下，您的字符串使用UTF-8. 使用这种方法将比尝试替换字符更好地扩展，因为您不需要为数据库中存在的每个 unicode 字符使用特殊情况。

IMO，在 Python 中处理 unicode 的最佳实践是：

尽早将来自外部源（如数据库）的字符串解码为 unicode。
在内部使用它们作为unicode对象。
仅当您需要将它们发送到外部位置（文件、数据库、套接字等）时，才将它们编码回字节字符串

我还建议查看这个幻灯片，它很好地概述了如何在 Python 中处理 unicode。

python - 在 gspread 包装器中使用 unicode 函数时出错。潜在和错误

1 回答 1

Related

Reference