1

我想刮一个页面 - 我正在使用scrapy和python来做同样的事情......

我想刮下你可以在下图中看到的按钮(左图)

http://postimg.org/image/syhauheo7/

当我点击绿色按钮时View Code,它会做三件事:

  1. 重定向到另一个 id。
  2. 打开一个弹出窗口,其中包含code
  3. 显示code在同一页面上,如右上图所示

如何使用 scrapy 和 python 框架抓取代码?

4

1 回答 1

2

这是你的蜘蛛:

from scrapy.http import Request
from scrapy.item import Item, Field
from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider


class VoucherItem(Item):
    voucher_id = Field()
    code = Field()


class CuponationSpider(BaseSpider):
    name = "cuponation"
    allowed_domains = ["cuponation.in"]
    start_urls = ["https://www.cuponation.in/babyoye-coupons"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)

        crawled_items = hxs.select('//div[@class="six columns voucher-btn"]/a')
        for button in crawled_items:
            voucher_id = button.select('@data-voucher-id').extract()[0]

            item = VoucherItem()
            item['voucher_id'] = voucher_id
            request = Request("https://www.cuponation.in/clickout/index/id/%s" % voucher_id,
                              callback=self.parse_code,
                              meta={'item': item})
            yield request

    def parse_code(self, response):
        hxs = HtmlXPathSelector(response)

        item = response.meta['item']
        item['code'] = hxs.select('//div[@class="code-field"]/span/text()').extract()

        return item

如果您通过以下方式运行它:

scrapy runspider <script_name.py> --output output.json

您将在 中看到以下内容output.json

{"voucher_id": "5735", "code": ["MUM10"]}
{"voucher_id": "3634", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "5446", "code": ["APP20"]}
{"voucher_id": "5558", "code": ["No code for this deal"]}
{"voucher_id": "1673", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "3963", "code": ["CNATION150"]}
{"voucher_id": "5515", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "4313", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "4309", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "1540", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "4310", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "1539", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "4312", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "4311", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "2785", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "3631", "code": ["Deal Activated. Enjoy Shopping"]}
{"voucher_id": "4496", "code": ["Deal Activated. Enjoy Shopping"]}

快乐爬行!

于 2013-05-11T23:51:39.323 回答