python - Python - 字符编码和解码问题

Question

我有 1 个带有 utf-8 字符（名称）的源文件
我有 1 个具有相同字符编码的输出文件。
我正在处理一个 html 页面，将有用的信息粘贴并剪切到文件中。
我在“friendsNames”txt 文件中使用“éáűúőóüöäđĐ”字符。

我给出了这个错误：

Traceback (most recent call last):
  File "C:\Users\Rendszergazda\workspace\achievements\hiba.py", line 9, in <module>
    s = str(urlopen("http://eu.battle.net/wow/en/character/arathor/"+str(names[0])+"/achievement").read(), encoding='utf-8')
  File "C:\Python27\lib\encodings\cp1250.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to <undefined>

你怎么看？我的问题是什么？

from urllib import urlopen
import codecs

result = codecs.open("C:\Users\Desktop\Achievements\Result.txt", "a", "utf-8")
fh = codecs.open("C:\Users\Desktop\Achievements\FriendsNames.txt", "r", "utf-8")
line = fh.readline()
names = line.split(" ")
fh.close()

s = urlopen("http://eu.battle.net/wow/en/character/arathor/"+str(names[0])+"/achievement").read(), encoding='utf8')
result.write(str(s))
result.close()

score 2 · Accepted Answer

str(array[0])您遇到的问题是您正在调用array[0]unicode 字符串。这意味着它将以默认编码进行编码，出于某种原因，在您的情况下似乎是cp1250. （你搞砸了sys.setdefaultencoding()？不要那样做。）

要从 unicode 中获取字节串，您应该显式地对 unicode进行编码。不要只是打电话str()给它。使用结果应该具有的编码对其进行编码（在 URL 的情况下有点难以猜测，但在这种情况下可能是 UTF-8。）所以，使用 `array[0].encode('utf-8' )'。您可能还需要在 URL 中引用非 ASCII 字符，但这取决于远程端的期望。

python - Python - 字符编码和解码问题

1 回答 1

Related

Reference