python-3.x - 使用 Scrapy 获取网站时出错

Question

我正在尝试通过 shell 中的 Scrapy 获取网站，

$ scrapy shell -s NAME="Mozilla/5.0" "http://www.yapo.cl/chile/inmuebles?ca=15_s&l=0&cmn=&st=a"

2017-08-21 20:55:07 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://www.yapo.cl/chile/inmuebles?ca=15_s&l=0&cmn=&st=a> (failed 3 times): 504 Gateway Time-out

但是通过我抓取一个我无法弄清楚的504错误，猜猜它可能是什么？

score 1 · Accepted Answer

您可能正在尝试在命令行上设置用户代理字符串，但使用了错误的设置 ( NAME)。试试看：

$ scrapy shell -s USER_AGENT="Mozilla/5.0" "http://www.yapo.cl/chile/inmuebles?ca=15_s&l=0&cmn=&st=a"

像这样，我得到：

2017-08-22 07:40:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.yapo.cl/chile/inmuebles?ca=15_s&l=0&cmn=&st=a> (referer: None)

score 0 · Accepted Answer

你被禁止或类似的东西。尝试使用另一个 IP 地址。在我的电脑上它给出了这个：

2017-08-22 00:07:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.yapo.cl/chile/inmuebles?ca=15_s&l=0&cmn=&st=a> (referer: None) ['partial']

python-3.x - 使用 Scrapy 获取网站时出错

2 回答 2

Related

Reference