python - Python：beautifulsoup 的输出编码错误

Question

当一个响应被放入beautifulsoup 时，我遇到了一个编码问题。响应的可读输出以正确的方式格式化，如Artikelstandort: Österreich，但在运行 beautifulsoup 后，它将转换为Artikelstandort: Ã–sterreich. 我将为您提供更改后的代码：

def formTest (browser, formUrl, cardName, edition):
   browser.open (formUrl)

   data = browser.response().read()
   with open ('analyze.txt', 'wb') as textFile:
      print 'wrinting file'
      textFile.write (data)

   #BS4 -> need from_encoding
   soup = BeautifulSoup (data, from_encoding = 'latin-1')
   soup = soup.encode ('latin-1').decode('utf-8')
   table = soup.find('table', { "class" : "MKMTable specimenTable"})

数据有正确的数据，但汤有错误的编码。我在汤上尝试了各种编码/解码，但没有得到任何工作结果。

我从中提取数据的页面是：https ://www.magickartenmarkt.de/Mutilate_Magic_2013.c1p256992.prod

编辑： 我像建议的那样使用 prettify 更改了编码，但现在我面临以下错误：

TypeError: slice indices must be integers or None or have an __index__ method

美化改变了什么？我绘制了新的输出，但表格仍在“汤”中（<table class="MKMTable specimenTable">）

编辑2：

新错误是：

在：soup.encode ('latin-1').decode('utf-8')

错误：UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 518: invalid start byte

如果我使用编码和解码，则会出现解码其他字节的错误。

score 1 · Accepted Answer

您现在可能不需要该解决方案，但是如果有人在这里停下来，您应该做的是：
您可能应该使用编码过程 ondata而不是 on soup。
我通常做的是使用requests库来获取原始响应，然后使用类似于'response.text'then 强制编码的语法来获取文本内容response.encoding='utf-8'。
至少，我将 response.text 提供给BeautifulSoup()

python - Python：beautifulsoup 的输出编码错误

1 回答 1

Related

Reference