我的爬虫中有这段代码
class StackSpider(InitSpider):
name = 'stack'
allowed_domains = ['sitepoint.com']
start_urls = ["http://www.sitepoint.com"]
start_page = "http://www.sitepoint.com"
item = StackItem()
def init_request(self):
return Request(url=self.start_page, callback=self.parse)
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="headline_area"]')
items = []
ivar = 1
for site in sites[:5]:
item = StackItem()
log.msg(' LOOP' +str(ivar)+ '', level=log.ERROR)
item['title'] ="yoo ma"
request = Request("http://www.sitepoint.com/getting-to-know-css3-selectors-structural-pseudo-classes/", callback=self.test1)
request.meta['item'] = item
ivar = ivar + 1
yield request
def test1(self, response):
log.msg(' LOOP 2 \n', level=log.ERROR)
item = response.meta['item']
item['desc'] = "test4"
return item
我按照文档做了,但它只适用于一个循环。我的意思是我只能在屏幕上看到登录
LOOP1
LOOP2
应该重复3次
我尝试了回报和收益的不同组合,所以
return request
并return item
给出输出LOOP1 LOOP2
yield request
并return item
给出输出LOOP1 LOOP1 LOOP1 LOOP2
yield request
并yield item
给出输出LOOP1 LOOP1 LOOP1 LOOP2
return request
并yield item
给出输出LOOP1 LOOP2
我怎样才能得到LOOP 1 LOOP2 LOOP1 LOOP2 AND so on