在我的settings.py下
SPLASH_URL = 'http://127.0.0.1:8050'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
我的蜘蛛源代码
# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider
from scrapy_splash import SplashRequest
class SampleSpider(CrawlSpider):
name = 'sample'
allowed_domains = ['sample.com']
def start_requests(self):
urls = [
'https://www.sample.com/view-all-clothing/bottoms/leggings'
]
for url in urls:
yield SplashRequest(url=url, callback=self.parse)
def parse(self,response):
for item in response.css("li.product-compact"):
yield {
'category_link': response.request.url,
'title': item.css("a.pdp-link::text").extract()
}
pass
码头集装箱
MINGW64 /c/Program Files/Docker Toolbox
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
75b69d937e79 scrapinghub/splash "python3 /app/bin/sp…" 16 minutes ago Up 16 minutes 5023/tcp, 127.0.0.1:8050->8050/tcp vigilant_chatterjee
仍然出现此错误
2018-07-10 15:18:35 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://127.0.0.1:8050/robots.txt> (failed 1 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:36 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://127.0.0.1:8050/robots.txt> (failed 2 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:37 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://127.0.0.1:8050/robots.txt> (failed 3 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:37 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://127.0.0.1:8050/robots.txt>: Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
ConnectionRefusedError: Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html> (failed 1 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:39 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html> (failed 2 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:40 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html> (failed 3 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:40 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html>: Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:40 [scrapy.core.engine] INFO: Closing spider (finished)
我已经完成了所有这些我知道没问题的设置,但我想不出我哪里做错了。
请让我知道,因为我还是 python、scrapy 和 splash JS 渲染服务的新手