7

我正在抓取一个网站以检查各种产品的库存状态。不幸的是,这需要实际单击产品页面上的“添加到购物车”并检查下一页的消息以确定是否有库存(即它需要解析两个响应)。

我遵循了这个场景的优秀文档并编写了我的解析函数来返回一个Request带有回调的对象到我的辅助解析函数。然而,这个函数很少被调用。大多数产品只会在日志中看到“退货请求之前”,但对于一小部分产品,它确实会被正确调用。

任何线索这里出了什么问题?我已经没有想法了。

foo/spiders/atlantic_firearms_spider.py

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.http import FormRequest
from foo.items import AtlanticFirearmsItem

import datetime
import re

class AtlanticFirearmsSpider(CrawlSpider):
    name = "atlantic_firearms"
    allowed_domains = ["atlanticfirearms.com"]
    start_urls = [
        "http://www.atlanticfirearms.com"
    ]

    rules = (
        Rule(SgmlLinkExtractor(allow=['detail.html']), callback='parse_product'),
        Rule(SgmlLinkExtractor(allow=[], deny=['/bro', '/news', '/howtobuy', '/component/search', 'askquestion'])),
    )

    def parse_product(self, response):
      hxs = HtmlXPathSelector(response)
      product = AtlanticFirearmsItem()
      add_to_cart = any([hxs.select("descendant-or-self::input[@name = 'addtocart']"),
                         hxs.select("descendant-or-self::input[@value = 'Add to Cart']"),
                         hxs.select("//a[text() = 'Add to Cart']")])
      product['url'] = response.url
      product['as_of_time'] = datetime.datetime.now()

      if add_to_cart:
          # attempt to add to cart to verify availability
          request = FormRequest.from_response(response, formname="addtocartForm", callback=self.parse_add_to_cart)
          request.meta['product'] = product
          print "Before return request"
          return request
      else:
          product['in_stock'] = False
          return product

    def parse_add_to_cart(self, response):
        print "Inside parse_add_to_cart"
        product = response.meta['product']
        hxs = HtmlXPathSelector(response)
        product['in_stock'] = not(hxs.select("//text()[contains(.,'We regret to inform you that this product')]"))
        return product

foo/items.py

from scrapy.item import Item, Field

class AtlanticFirearmsItem(Item):
    in_stock = Field()
    url = Field()
    as_of_time = Field()

编辑:按要求添加日志文件:

2013-09-21 07:25:14-0500 [scrapy] INFO: Scrapy 0.18.2 started (bot: foo)
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Optional features available: ssl, http11
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Overridden settings: {'SPIDER_MODULES': ['foo.spiders'], 'BOT_NAME': 'foo'}
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRef
reshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled item pipelines: 
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Spider opened
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com> (referer: None)
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cloudflare.com': <GET http://www.cloudflare.com/email-protection>
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.constantcontact.com': <GET http://www.constantcontact.com/jmml/email-marketing.jsp>
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.fdicreative.com': <GET http://www.fdicreative.com/>
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redjacketfirearms.com': <GET https://www.redjacketfirearms.com/>
2013-09-21 07:25:17-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-ammunition-45acp-500-round-case-detail.
html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.h
tml?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Filtered duplicate request: <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pisto
l-9mm-detail.html?Itemid=0> - no more duplicates will be shown (see DUPEFILTER_CLASS)
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-223-ar15-rifle-ammo-500-round-case-deta
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/us-palm-air-save-plate-carrier-detail.html?I
temid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/545-x-39-russian-ak74-ammo-1080-round-case-d
etail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/red-army-standard-7-62x39mm-360-round-range-
pack-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-mp5-style-rifle-detail.html?Itemid=0> (
referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/wolf-ammunition-for-sale-ak47-detail.html?Item
id=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/dsa-zm4-flat-top-ar15-carbine-dszm4cv1r-detail.html
?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/m92-ak47-yugoslavian-7-62x39mm-bolt-hold-open-
metal-mags-pack-of-two-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-v94-9mm-mp5-style-pistol-full-size-deta
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/zastava-ak-47-m70b1-pap-7-62x39mm-rifles-w-2-hi-cap
-mags-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ptr-91-gi-rifle-939-atlanticfirearms-com-detail.htm
l?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/pap-m92-7-62x39-pistol-detail.html?Itemid=0> (refer
er: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/content/article/86-static-pages/159-resources.html> (referer: http://www.atlan
ticfirearms.com)
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atsconsultingcorp.com': <GET http://www.atsconsultingcorp.com/>                                  [52/1905]
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bullseyemarket.com': <GET http://www.bullseyemarket.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.corilam.com': <GET http://www.corilam.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'chancebrownrealestate.com': <GET http://chancebrownrealestate.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.delsolservices.com': <GET http://www.delsolservices.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.elkhornoutfitters.com': <GET http://www.elkhornoutfitters.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.frontierlogistics.com': <GET http://www.frontierlogistics.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gpstrackingkey.com': <GET http://www.gpstrackingkey.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.hanshawkennedy.com': <GET http://www.hanshawkennedy.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'worldenv.com': <GET http://worldenv.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'purgexonline.com': <GET http://purgexonline.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bumpfirestocks.com': <GET http://bumpfirestocks.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texrestaurantequipment.com': <GET http://www.texrestaurantequipment.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.houston-refinance.com': <GET http://www.houston-refinance.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'johnson-bryan.com': <GET http://johnson-bryan.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'kanesforms.com': <GET http://kanesforms.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.markfoxrealestate.com': <GET http://www.markfoxrealestate.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.mphoa.org': <GET http://www.mphoa.org/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.outfitterwebsites.com': <GET http://www.outfitterwebsites.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'outdoortrailsnetwork.com': <GET http://outdoortrailsnetwork.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.psychologicalriskservices.com': <GET http://www.psychologicalriskservices.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rcshouston.com': <GET http://www.rcshouston.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rollingcreekcarwash.com': <GET http://www.rollingcreekcarwash.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'slammc.com': <GET http://slammc.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texassaltwaterfishingguide.com': <GET http://www.texassaltwaterfishingguide.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.waynepigment.com': <GET http://www.waynepigment.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bancroftfeldman.com': <GET http://bancroftfeldman.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'elilanddesign.com': <GET http://elilanddesign.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpharms.com': <GET http://dpharms.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'contractlandstaff.com': <GET http://contractlandstaff.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'knightsplumbing.com': <GET http://knightsplumbing.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ati-omni-5-56-poly-competition-m4-carbine-detail.ht
ml?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-accessories/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/dallas-gun-shop.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-accessories/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/component/content/?Itemid=803&id=148> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/houston-texas-gun-shop.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/california-gun-shop.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/browse-our-products.html> (referer: http://www.atlanticfirearms.com/component/virtuemart
/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0)
Inside parse_add_to_cart
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Scraped from <200 http://www.atlanticfirearms.com/browse-our-products.html>
        {'as_of_time': datetime.datetime(2013, 9, 21, 7, 25, 18, 365559),
         'in_stock': True,
         'url': 'http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0'}
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/www.atlanticfirearms.com> (referer: http://www.atlanticfirearms.com/dallas-gun-shop.html
)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/login-or-register/editaddress.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/privacy-policy.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/subscribe.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/links.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunbroker.com': <GET http://www.gunbroker.com/user/dealernetwork.asp>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.auctionarms.com': <GET http://www.auctionarms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunsamerica.com': <GET http://www.gunsamerica.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ar15.com': <GET http://www.ar15.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.olyarms.com': <GET http://www.olyarms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cheaperthandirt.com': <GET http://www.cheaperthandirt.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ammoman.com': <GET http://www.ammoman.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.net': <GET http://www.ak47.net/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atf.treas.gov': <GET http://www.atf.treas.gov/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'caag.state.ca.us': <GET http://caag.state.ca.us/firearms/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.nra.org': <GET http://www.nra.org/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.masterpiecearms.com': <GET http://www.masterpiecearms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'atlantic1.readyhosting.com': <GET http://atlantic1.readyhosting.com/programming/listview.asp?CatId=2>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vulcanarmament.com': <GET http://www.vulcanarmament.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bushmaster.com': <GET http://www.bushmaster.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rockriverarms.com': <GET http://www.rockriverarms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpmsinc.com': <GET http://dpmsinc.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.colt.com': <GET http://www.colt.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.armalite.com': <GET http://www.armalite.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redstick-firearms.com': <GET http://www.redstick-firearms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vectorarms.com': <GET http://www.vectorarms.com/indexframe.html>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.arsenalinc.com': <GET http://www.arsenalinc.com/about.htm>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.com': <GET http://www.ak47.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.jldenter.com': <GET http://www.jldenter.com/store/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.springfield-armory.com': <GET http://www.springfield-armory.com/index.shtml>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.dsarms.com': <GET http://www.dsarms.com/>
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT, shutting down gracefully. Send again to force 
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT twice, forcing unclean shutdown
4

1 回答 1

29

发布我之前的评论作为答案。

由于您的所有 POST 请求(来自FormRequest.from_response())都被重定向到http://www.atlanticfirearms.com/browse-our-products.html,您应该设置dont_filter=True

    if add_to_cart:
        # attempt to add to cart to verify availability
        request = FormRequest.from_response(response, formname="addtocartForm",
                      callback=self.parse_add_to_cart, dont_filter=True)

请参阅有关请求的 Scrapy 文档

dont_filter (boolean) – 表示此请求不应被调度程序过滤。当您想要多次执行相同的请求以忽略重复过滤器时使用此选项。

此外,您可能希望设置CONCURRENT_REQUESTS = 1在购物车中逐一添加商品(我想知道服务器如何处理并行购物车添加。)

于 2013-09-21T14:34:19.937 回答