web-scraping - 使用 scrapy shell 时，我没有从 response.xpath 获得任何数据

Question

我正在尝试抓取一个博彩网站。但是，当我在 scrapy shell 中检查检索到的数据时，我什么也没收到。

我需要的 xpath 是： //*[@id="yui_3_5_0_1_1562259076537_31330"] 当我在 shell 中写入时，这就是我得到的：


In [18]: response.xpath ( '//*[@id="yui_3_5_0_1_1562259076537_31330"]')
Out[18]: []

输出是 [] 但我希望是可以从中提取 href 的东西。

当我使用 Chrome 中的“检查”工具时，当网站仍在加载时，此 ID 以紫色标出。这是否意味着该站点正在使用 JavaScipt？如果这是真的，这就是scrapy找不到项目并返回[]的原因吗？

score 0 · Accepted Answer

我尝试使用 Scrapy 抓取网站，这是我的结果。

这是 items.py 文件

    import scrapy

    class LifeMatchsItem(scrapy.Item):

        Event = scrapy.Field() # Name of event
        Match = scrapy.Field() # Teams1 vs Team2
        Date = scrapy.Field()  # Date of Match

这是我的蜘蛛代码


    import scrapy
    from LifeMatchesProject.items import LifeMatchsItem


    class LifeMatchesSpider(scrapy.Spider):
        name = 'life_matches'
        start_urls = ['http://www.betfair.com/sport/home#sscpl=ro/']

        custom_settings = {'FEED_EXPORT_ENCODING': 'utf-8'}

        def parse(self, response):
            for event in response.xpath('//div[contains(@class,"events-title")]'):
                for element in event.xpath('./following-sibling::ul[1]/li'):
                    item = LifeMatchsItem()
                    item['Event'] = event.xpath('./a/@title').get()
                    item['Match'] = element.xpath('.//div[contains(@class,"event-name-info")]/a/@data-event').get()
                    item['Date'] = element.xpath('normalize-space(.//div[contains(@class,"event-name-info")]/a//span[@class="date"]/text())').get()
                    yield item

这就是结果

web-scraping - 使用 scrapy shell 时，我没有从 response.xpath 获得任何数据

1 回答 1

我尝试使用 Scrapy 抓取网站，这是我的结果。

Related

Reference