1

In my spider after receiving response I want to download and show captcha image and then continue crawling:

    def get_captcha(self, response):
        print '\nLoading captcha...\n'
        item = CaptchaItem()
        hxs = HtmlXPathSelector(response)
        captcha_img_src = hxs.select('//*[@id="captcha-image"]/@src').extract()[0]
        item['image_urls'] = [captcha_img_src]
        return item

But I don't know when image is loaded and how to continue crawling after that.

FYI: Captcha image can't be downloaded without cookies.

Thanks in advance!

4

1 回答 1

0

Use yield instead of return:

 def get_captcha(self, response):
    print '\nLoading captcha...\n'
    item = CaptchaItem()
    hxs = HtmlXPathSelector(response)
    captcha_img_src = hxs.select('//*[@id="captcha-image"]/@src').extract()[0]
    item['image_urls'] = [captcha_img_src]
    yield item
    #you may display here your scraped item and after that
    #your further post request goes here...
    yield your_request
于 2013-08-26T11:41:30.547 回答