1

运行此代码:

from bs4 import BeautifulSoup
soup = BeautifulSoup (open("my.html"))
print(soup.prettify())

产生此错误:

Traceback (most recent call last):
  File "soup.py", line 5, in <module>
    print(soup.prettify())
  File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u25ba' in position
9001: character maps to <undefined>

然后我尝试了:

print(soup.encode('UTF-8').prettify())

但是由于使用 bytes 对象的字符串格式化,这失败了:

Traceback (most recent call last):
  File "soup.py", line 11, in <module>
    print(soup.encode('UTF-8').prettify())
AttributeError: 'bytes' object has no attribute 'prettify'

不知道如何解决这个问题。任何投入将不胜感激。

4

1 回答 1

3

您的 (Windows) 控制台正在使用cp437编码,并且该编码不支持汤中有一个字符。默认是在这种情况下抛出异常,但您可以更改它。

import sys,io
from bs4 import BeautifulSoup
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,'cp437','backslashreplace')
soup = BeautifulSoup (open("my.html"))
print(soup.prettify())

或者,将汤写入文件并使用支持编码的编辑器读取:

# On Windows, utf-8-sig will allow the file to be read by Notepad.
with open('out.txt','w',encoding='utf-8-sig') as f:
   f.write(soup.prettify())
于 2013-02-15T06:26:30.457 回答