python - 无法解码奇怪的 Xml 文件

Question

我必须解析一些 xml 输出（从请求到网站），如下所示。它们部分用英语，部分用法语。我无法解码和打印（在屏幕上，在文件上）法语口音，如“é”或“à”

当我使用时decode('utf-8')，我有一个错误的结果，比如' Ã¨'。我正在使用 python 3.3。

b'Extr\xc3\x83\xc2\xaamement fort et incroyablement pr\xc3\x83\xc2\xa8s</title><originaltitle>Extremely Loud And Incredibly Close</originaltitle><year>2011</year><runtime>0</runtime><directors><director>Stephen Daldry</director></directors><plot>Oskar Schell, 11 ans, est un jeune New-Yorkais \xc3\x83\xc2\xa0 l\'imagination d\xc3\x83\xc2\xa9bordante. Un an apr\xc3\x83\xc2\xa8s la...</plot></movie></results>\n'

score 5 · Accepted Answer

您粘贴的字节字符串是双重编码的，

byteStrInYourQuestion.decode('utf-8').encode("ISO-8859-1").decode("utf-8")

应该管用。

python - 无法解码奇怪的 Xml 文件

1 回答 1

Related

Reference