我有这个简单的代码:
import requests
r = requests.get('https://yahoo.com')
print(r.url)
执行后,打印:
https://uk.yahoo.com/?p=us
我想看看:
在到达之前发生了多少重定向
https://uk.yahoo.com/?p=us
(显然,我https://yahoo.com
最初输入的重定向)?我还想保存每一页的内容,而不仅仅是最后一页。这个怎么做?
我有这个简单的代码:
import requests
r = requests.get('https://yahoo.com')
print(r.url)
执行后,打印:
https://uk.yahoo.com/?p=us
我想看看:
在到达之前发生了多少重定向https://uk.yahoo.com/?p=us
(显然,我https://yahoo.com
最初输入的重定向)?
我还想保存每一页的内容,而不仅仅是最后一页。这个怎么做?
Use response.history
. From the documentation...
The Response.history list contains the Response objects that were created in order to complete the request. The list is sorted from the oldest to the most recent response.
So, to get the number of intermediate URLs, you could do something like:
response = requests.get(url)
print(len(response.history))
And to get what those URLs actually were and what their responses contain, you could do:
for resp in response.history:
print(resp.url, resp.text)
If needed, you can also submit a new request to the intermediate URLs with the optional parameter allow_redirects
set to False
:
r = requests.get(resp.url, allow_redirects=False)