我只是想检索一个网页,但不知何故,HTML 文件中嵌入了一个外来字符。当我使用“查看源代码”时,这个字符不可见。
isbn = 9780141187983
url = "http://search.barnesandnoble.com/booksearch/isbninquiry.asp?ean=%s" % isbn
opener = urllib2.build_opener()
url_opener = opener.open(url)
page = url_opener.read()
html = BeautifulSoup(page)
html #This line causes error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 21555: ordinal not in range(128)
我也试过...
html = BeautifulSoup(page.encode('utf-8'))
如何在不出现此错误的情况下将此网页读入 BeautifulSoup?