我的爬虫中有这段代码
class StackSpider(InitSpider):
name = 'stack'
allowed_domains = ['sitepoint.com']
start_urls = ["http://www.sitepoint.com"]
start_page = "http://www.sitepoint.com"
item = StackItem()
def init_request(self):
return Request(url=self.start_page, callback=self.parse)
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="headline_area"]')
items = []
ivar = 1
for site in sites[:5]:
item = StackItem()
log.msg(' LOOP' +str(ivar)+ '', level=log.ERROR)
item['title'] ="yoo ma"
request = Request("http://www.sitepoint.com/getting-to-know-css3-selectors-structural-pseudo-classes/", callback=self.test1)
request.meta['item'] = item
ivar = ivar + 1
yield request
def test1(self, response):
log.msg(' LOOP 2 \n', level=log.ERROR)
item = response.meta['item']
item['desc'] = "test4"
return item
我按照文档做了,但它只适用于一个循环。我的意思是我只能在屏幕上看到登录
LOOP1
LOOP2
应该重复3次
我尝试了回报和收益的不同组合,所以
return request并return item给出输出LOOP1 LOOP2yield request并return item给出输出LOOP1 LOOP1 LOOP1 LOOP2yield request并yield item给出输出LOOP1 LOOP1 LOOP1 LOOP2return request并yield item给出输出LOOP1 LOOP2
我怎样才能得到LOOP 1 LOOP2 LOOP1 LOOP2 AND so on