python - 从 Craigslist 抓取图像时出现连接错误

问问题 2018-10-30T19:35:16.347

92 次

作为从 Craigslist 抓取数据的项目的一部分，我包括图像抓取。我在测试中注意到有时连接被拒绝。有没有办法解决这个问题，还是我需要在我的代码中加入错误捕获？我记得 twitter API 限制了查询，因此包含了一个睡眠计时器。好奇我是否与 Craigslist 有同样的情况。请参阅下面的代码和错误。

import requests
from bs4 import BeautifulSoup


#loops through each image and stores it in a local folder
for img in soup_test.select('a.thumb'):
    imgcount += 1
    filename = (pathname +  "/" + motoid + " - "+str(imgcount)+".jpg")
    with open(filename, 'wb') as f:
        response = requests.get(img['href'])
        f.write(response.content)

ConnectionError: HTTPSConnectionPool(host='images.craigslist.org', port=443): Max retries exceeded with url: /00707_fbsCmug4hfR_600x450.jpg （由 NewConnectionError 引起（'：无法建立新连接：[WinError 10061] 无法连接）因为目标机器主动拒绝它而被制造'，））

我对这种行为有 2 个问题。

CL 服务器是否有任何规则或协议，例如在特定时间范围内阻止第 n 个请求？
有没有办法在连接被拒绝后暂停循环？还是我只是合并错误捕获以使其不会停止我的程序？

python - 从 Craigslist 抓取图像时出现连接错误

0 回答 0

Related

Reference