python - BeautifulSoup utf-8 解码错误

翻译自：https://stackoverflow.com/questions/17866675 2013-07-25T19:06:56.517

1702 次

我正在尝试使用 BeautifulSoup 4 从网站中提取电影信息。代码的相关部分是这样的：

from bs4 import BeautifulSoup as Soup
import requests

url = r'http://www.the-numbers.com/movies/1997/ASGOD.php' #is passed relevant url
r = requests.get(url)
soup = Soup(r.content, from_encoding = r.encoding)

尽管它在一堆网站的页面中都可以正常工作，但在这个特定的页面上它会返回错误消息：

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

至少这通常是错误的。它也偶尔（并且看似随机）给我一个稍微不同的区域，抱怨不同位置的不同解码位（例如位置 229 中的 0xea）。

问题页面在这里。一个看起来非常相似但确实有效的例子在这里。

我假设该页面上有某种编码错误会引发 BeautifulSoup 循环，所以我想我的问题是是否有某种方法可以修复该错误？

非常感谢，亚历克斯

python - BeautifulSoup utf-8 解码错误

0 回答 0

Related

Reference