1

在我的settings.py下

SPLASH_URL = 'http://127.0.0.1:8050'
DOWNLOADER_MIDDLEWARES = {
  'scrapy_splash.SplashCookiesMiddleware': 723,
  'scrapy_splash.SplashMiddleware': 725,
  'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
  'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

我的蜘蛛源代码

# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider
from scrapy_splash import SplashRequest

class SampleSpider(CrawlSpider):
  name = 'sample'
  allowed_domains = ['sample.com']

  def start_requests(self):
    urls = [
      'https://www.sample.com/view-all-clothing/bottoms/leggings'
    ]

    for url in urls:
      yield SplashRequest(url=url, callback=self.parse)

  def parse(self,response):
    for item in response.css("li.product-compact"):
      yield {
        'category_link': response.request.url,
        'title': item.css("a.pdp-link::text").extract()
      }

  pass

码头集装箱

MINGW64 /c/Program Files/Docker Toolbox
$ docker container ls
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS              PORTS                                NAMES
75b69d937e79        scrapinghub/splash   "python3 /app/bin/sp…"   16 minutes ago      Up 16 minutes       5023/tcp, 127.0.0.1:8050->8050/tcp   vigilant_chatterjee

仍然出现此错误

2018-07-10 15:18:35 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://127.0.0.1:8050/robots.txt> (failed 1 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:36 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://127.0.0.1:8050/robots.txt> (failed 2 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:37 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://127.0.0.1:8050/robots.txt> (failed 3 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:37 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://127.0.0.1:8050/robots.txt>: Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
ConnectionRefusedError: Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html> (failed 1 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:39 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html> (failed 2 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:40 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html> (failed 3 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:40 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html>: Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:40 [scrapy.core.engine] INFO: Closing spider (finished)

我已经完成了所有这些我知道没问题的设置,但我想不出我哪里做错了。

请让我知道,因为我还是 python、scrapy 和 splash JS 渲染服务的新手

4

1 回答 1

0

它应该在settings.py 中设置

SPLASH_URL = 'http://0.0.0.0:8050'

并且 docker 容器应该是监听服务器的网卡。

0.0.0.0:8050->8050/tcp
于 2018-07-10T07:42:25.277 回答