python - 解析推文 json UnicodeEncodeError 时出现新人错误：“charmap”编解码器无法对位置 13-63 中的字符进行编码：字符映射到

Question

我正在尝试遵循 Intro to Data Sci coursera 课程。但是我在尝试解析来自 twitter 的 json 响应时遇到了问题

我正在尝试从以下格式的 json 中检索文本。

{u'delete': {u'status': {u'user_id_str': u'702327198', u'user_id': 702327198, u'id': 332772178690981889L, u'id_str': u'332772178690981889'}}}, {u'delete': {u'status': {u'user_id_str': u'864736118', u'user_id': 864736118, u'id': 332770710667792384L, u'id_str': u'332770710667792384'}}}, {u'contributors': None, u'truncated': False, **u'text'**: u'RT @afgansyah_reza: Lagi ngantri. Ada ibu2 &amp; temennya. "Ih dia mukanya mirip banget sama Afgan.", trus ngedeketin gw, "Tuh kan.. Mirip bang\u2026', u'in_reply_to_status_id': None, u'id': 332772350640668672L, u'favorite_count': 0, ....... ]

这是我使用的代码：

def hw():
    data = []
    count=0
    with open('output.txt') as f:
        for line in f:
            encoded_string = line.strip().encode('utf-8')
            data.append(json.loads(encoded_string))

    print data# generates the input to next block
    for listval in data:#individual block
        if "text" in listval:
            print listval["text"]
        else:
            continue

但是，当我运行它时，我得到以下输出和错误

   RT @afgansyah_reza: Lagi ngantri. Ada ibu2 &amp; temennya. "Ih dia mukanya mirip banget sama Afgan.", trus ngedeketin gw, "Tuh kan.. Mirip bang…
RT @Dimaz_CSIX: Kolor pakek pita #laguharlemshake
Traceback (most recent call last):
  File "F:\ProgrammingPoint\workspace-new\PyTest\tweet_sentiment.py", line 41, in <module>
    main()
  File "F:\ProgrammingPoint\workspace-new\PyTest\tweet_sentiment.py", line 36, in main
    hw()
  File "F:\ProgrammingPoint\workspace-new\PyTest\tweet_sentiment.py", line 23, in hw
    print listval["text"]
  File "C:\Python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 13-63: character maps to <undefined>

我是 Python 的新手，任何帮助将不胜感激。

score 9 · Accepted Answer

9

于 2013-05-10T22:48:10.883 回答

score 5 · Accepted Answer

If you are using PyDev Eclipse Plugin try going to Windows->Preferences->General->Workspace and choose at the left lower corner at TEXT FILE ENCODING -> Choose Other = UTF-8

It might work.

score 0 · Accepted Answer

Your json.loads call is converting the UTF-8 encoded json back into a Python Unicode string. When you print it, it attempts to convert the text into your environment's default encoding, which the cp1252.py reference makes clear is Windows code page 1252. You'll have to decide what output format and encode to that before printing. If you want cp1252, give it an error handler other than the default of 'strict'.

http://docs.python.org/2/howto/unicode.html has the full docs, including the various error handler possibilities.

python - 解析推文 json UnicodeEncodeError 时出现新人错误：“charmap”编解码器无法对位置 13-63 中的字符进行编码：字符映射到

3 回答 3

Related

Reference