python - 找不到此网页的正确压缩（python requests.get）

Question

我可以在 Google Chrome 中加载此网页，但无法通过requests. 知道压缩问题是什么吗？

代码：

import requests


url = r'https://www.huffpost.com/entry/sean-hannity-gutless-tucker-carlson_n_60d5806ae4b0b6b5a164633a'
headers = {'Accept-Encoding':'gzip, deflate, compress, br, identity'}

r = requests.get(url, headers=headers)

结果：

ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))

score 1 · Accepted Answer

使用模拟浏览器的用户代理：

import requests

url = r'https://www.huffpost.com/entry/sean-hannity-gutless-tucker-carlson_n_60d5806ae4b0b6b5a164633a'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}

r = requests.get(url, headers=headers)

score 0 · Accepted Answer

您收到一个403 Forbidden错误，您可以使用requests.head. 使用RJ的建议来击败 huffpost 的机器人封锁。

>>> requests.head(url)
<Response [403]>

python - 找不到此网页的正确压缩（python requests.get）

代码：

结果：

2 回答 2

Related

Reference