2

My spider needs to be somewhat adaptable for the site I am scraping in that the info I need to fetch is at times in div[1] and at other times in div[2]. Here's an example:

item['details'] = site.select('//*[@id="detailFacts"]/div[2]/div[2]//text()').extract()

or

item['details'] = site.select('//*[@id="detailFacts"]/div[1]/div[2]//text()').extract()

How do I combine both of these in a single statement so that scrapy fetches me from EITHER of these?

4

1 回答 1

2

试试这个:

details = site.select('//*[@id="detailFacts"]/div[1]/div[2]//text()|//*[@id="detailFacts"]/div[2]/div[2]//text()').extract()
item['details'] = next(s for s in details if s)  # getting first not-empty item from the list

或者

details = site.select('//*[@id="detailFacts"]/div[1]|div[2]/div[2]//text()').extract()
item['details'] = next(s for s in details if s)  # getting first not-empty item from the list

希望对你有效。

于 2013-06-05T17:43:45.727 回答