python-2.7 - UnicodeDecodeError 使用 TextBlob 和 Python 2.7

Question

**此帖子中已解决的问题**我正在使用 Windows 10 PC 并尝试抓取和分析网站论坛。我的解决方案使用 Scrapy 和 Textblob，并且我正在运行 Python 2.7。抓取会生成所需的输出（我将其保存为 .csv 或 .json）。但是，当我在集成 TextBlob 的 Python 脚本中使用此文件时，出现以下错误：

Traceback (most recent call last):
  File "C:\Users\Marcus\Documents\Blog\Python\Scripts\Brooks\textblob_sentiment.py", line 14, in <module>
print blob
  File "C:\Python27\lib\site-packages\textblob\compat.py", line 30, in <lambda>
cls.__str__ = lambda x: x.__unicode__().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 425: ordinal not in range(128)

产生此错误的脚本是：

# from __future__ import division, unicode_literals (This was recommended     for Python 2.x, but didn't help in my case.)

import csv

from textblob import TextBlob


infile = 'items.csv'

with open(infile, 'r') as scrape_file:
    comments = csv.reader(scrape_file)
    for comment in comments:
        sentence = comment[0]
        blob = TextBlob(sentence)
        print blob

代码的结构类似于我在 SO 上找到的另一个线程，并且我还尝试根据我在 SO 上找到的其他线程将编码/解码方法集成到此脚本中。但也许我没有正确地这样做（我不是开发人员）。我还尝试打开 json 文件，认为问题可能出在 .csv 的编码方式上。我可以打印所需的内容（例如，“打印句子”或“打印评论”，只有当我尝试使用 TextBlob 时才会收到错误消息。
您是否有解决此错误的解决方案？既然我想再次使用这些库，我怎样才能避免类似的头痛？

非常感谢您对此的帮助...

score 0 · Accepted Answer

尝试这个：

unicodedata.normalize('NFKD', sentence).encode('ascii','ignore').lower()

确保导入 unicode

import unicodedata

python-2.7 - UnicodeDecodeError 使用 TextBlob 和 Python 2.7

1 回答 1

Related

Reference