python - Python POST 请求编码

Question

情况就是这样，我正在发送 POST 请求并尝试使用 Python 获取响应问题是它扭曲了非拉丁字母，当我使用直接链接（没有搜索结果）获取同一页面时不会发生这种情况，但是 POST请求不会生成链接

这就是我所做的：

import urllib
import urllib2
url = 'http://donelaitis.vdu.lt/main_helper.php?id=4&nr=1_2_11'
data = 'q=bus&ieskoti=true&lang1=en&lang2=en+-%3E+lt+%28+71813+lygiagre%C4%8Di%C5%B3+sakini%C5%B3+%29&lentele=vertikalus&reg=false&rodyti=dalis&rusiuoti=freq' 
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
file = open("pagesource.txt", "w")
file.write(the_page)
file.close()

每当我尝试

thepage = the_page.encode('utf-8')

我收到此错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1008: ordinal not in range(128)

每当我尝试更改响应标头 Content-Type:text/html;charset=utf-8 时，我都会这样做

response['Content-Type'] = 'text/html;charset=utf-8'

我收到此错误：

AttributeError: addinfourl instance has no attribute '__setitem__'

我的问题：是否可以编辑或删除响应或请求标头？如果没有，除了将源代码复制到记事本++并手动修复编码之外，还有其他方法可以解决这个问题吗？

我是 python 和数据挖掘的新手，如果我做错了什么，真的希望你能告诉我

谢谢

score 2 · Accepted Answer

为什么不尝试thepage = the_page.decode('utf-8')而不是encode因为您想要从 utf-8 编码文本移动到 unicode - 编码不可知 - 内部字符串？

score 1 · Accepted Answer

两件事情。首先，您不想对响应进行编码，而是想对其进行解码：

thepage = the_page.decode('utf-8')

其次，您不想在响应上设置标头，而是在请求上设置它，使用以下add_header方法：

req.add_header('Content-Type', 'text/html;charset=utf-8')

python - Python POST 请求编码

2 回答 2

Related

Reference