1
import scrapy
class rottenTomatoesSpider(scrapy.Spider):
    name = "movieList"
    start_urls = [
         'https://www.rottentomatoes.com/'
    ]

def parse(self, response):
    for movieList in response.xpath('//div[@id="homepage-opening-this-week"]'):
        yield {
           'score': response.css('td.left_col').extract_first(),
           'title': response.css('td.middle_col').extract_first(),
           'openingDate': response.css('td.right_col right').extract_first()
        }

所以蜘蛛是在刮<div id='homepage-tv-top'>

我假设这homepage-是混淆脚本的原因。有人知道解决方法吗?

4

1 回答 1

2

您需要遍历每个tr,并且还需要在 for 循环中使用movieList而不是response

for movieList in response.xpath('//div[@id="homepage-opening-this-week"]//tr'):
    yield {
       'score': "".join(a for a in movieList.css('td.left_col *::text').extract()),
       'title': "".join(a for a in movieList.css('td.middle_col *::text').extract()),
       'openingDate': "".join(a for a in movieList.css('td.right_col *::text').extract())
    }
于 2018-03-09T07:33:16.157 回答