python-2.7 - raw_inputing unicode 字符串

Question

我已经不止一次地阅读了“unicode on python 2.7 how-to”并彻底浏览了这个论坛，但我没有找到并尝试过让我的程序正常工作。

它应该将 dictionary.com 条目转换为例句集和单词发音对。然而它在一开始就失败了：IPA（即unicode）字符在输入后立即转换为乱码。

# -*- coding: utf-8 -*-

""" HERE'S HOW A TYPICAL DICTIONARY.COM ENTRY LOOKS LIKE
white·wash
/ˈʰwaɪtˌwɒʃ, -ˌwɔʃ, ˈwaɪt-/ Show Spelled
noun
1.
a composition, as of lime and water or of whiting, size, and water, used for whitening walls, woodwork, etc.
2.
anything, as deceptive words or actions, used to cover up or gloss over faults, errors, or wrongdoings, or absolve a wrongdoer from blame.
3.
Sports Informal. a defeat in which the loser fails to score.
verb (used with object)
4.
to whiten with whitewash.
5.
to cover up or gloss over the faults or errors of; absolve from blame.
6.
Sports Informal. to defeat by keeping the opponent from scoring: The home team whitewashed the visitors eight to nothing.
"""

def wdefinp():   #word definition input
    wdef=u''
    emptylines=0 
    print '\nREADY\n\n'
    while True:
        cinp=raw_input()   #current input line
        if cinp=='':
            emptylines += 1
            if emptylines >= 3:   #breaking out by 3xEnter
                wdef=wdef[:-2]
                return wdef
        else:
            emptylines = 0
        wdef=wdef + '\n' + cinp
    return wdef

wdef=wdefinp()
print wdef.decode('utf-8')

这会产生：whiteÂ·wash /Ë�Ę°waÉŞtËŚwÉ'Ę�, -ËŚwÉ”Ę�, Ë�waÉŞt-/ Show Spelled ...

任何帮助将不胜感激。

score 0 · Accepted Answer

好的，我设法用你的程序复制了几个错误

首先，如果我在终端中运行它并将示例文本粘贴到其中，我会在这一行出现错误（对不起，我的行号与您的不匹配）：

  File "unicod.py", line 22, in wdefinp
    wdef=wdef + '\n' + cinp
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 5: ordinal not in range(128)

为了解决这个问题，我使用了这个 stackoverflow 问题的答案：如何在 Python 中读取 Unicode 输入和比较 Unicode 字符串？

固定线路是

cinp = raw_input().decode(sys.stdin.encoding)

基本上你需要知道输入编码，然后转换为utf8是可能的

一旦解决了，下一个问题就是类似的问题

File "unicod.py", line 28, in <module>
    print wdef.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 6: ordinal not in range(128)

因为从函数返回的数据已经是 utf8 “双重解码”，所以它不起作用。只需删除“ .decode('utf8')”，它就可以正常工作

python-2.7 - raw_inputing unicode 字符串

1 回答 1

Related

Reference