我从国际刑警组织网站收集到通缉犯的链接。大约有 10k 个链接。一个一个地抓取需要几个小时,所以我正在寻找用grequests
.
这是我的链接列表的预览:
final_links[:20]
['https://www.interpol.int/notice/search/wanted/2009-19572',
'https://www.interpol.int/notice/search/wanted/2015-74196',
'https://www.interpol.int/notice/search/wanted/2014-37667',
'https://www.interpol.int/notice/search/wanted/2011-30019',
'https://www.interpol.int/notice/search/wanted/2009-34171',
'https://www.interpol.int/notice/search/wanted/2012-334072',
'https://www.interpol.int/notice/search/wanted/2012-334068',
'https://www.interpol.int/notice/search/wanted/2012-334070',
'https://www.interpol.int/notice/search/wanted/2013-26064',
'https://www.interpol.int/notice/search/wanted/2013-2528',
'https://www.interpol.int/notice/search/wanted/2014-32597',
'https://www.interpol.int/notice/search/wanted/2013-23413',
'https://www.interpol.int/notice/search/wanted/2010-42146',
'https://www.interpol.int/notice/search/wanted/2015-30555',
'https://www.interpol.int/notice/search/wanted/2013-2514',
'https://www.interpol.int/notice/search/wanted/2010-53288',
'https://www.interpol.int/notice/search/wanted/2015-58805',
'https://www.interpol.int/notice/search/wanted/2015-58807',
'https://www.interpol.int/notice/search/wanted/2015-58803',
'https://www.interpol.int/notice/search/wanted/2015-62307']
现在我试图从每个链接中获取响应:
unsent_request = (grequests.get(url) for url in final_links)
results = grequests.map(unsent_request)
前几个结果是响应 200,但随后大多数(不是全部)是 403。只是国际刑警组织服务器不允许这样做还是我做错了什么(我太贪心了吗?:))?当我一一使用时requests
,它工作正常。