当使用 Scrapy 下载器中间件时,你找不到你需要的东西。您是构建一个Response对象并返回它还是返回response传入的变量process_response?
我尝试了后者,但在response has no attribute selector与 FilesPipeline 一起使用时不断得到。
class CaptchaMiddleware(object):
def process_response(self, request, response, spider):
download_path = spider.settings['CAPTCHA_STORE']
# 1
captcha_images = parse_xpath(response, CAPTCHA_PATTERN, 'image')
if captcha_images:
for url in captcha_images:
url = response.urljoin(url)
print("Downloading %s" % url)
download_file(url, os.path.join(download_path, url.split('/')[-1]))
for image in os.listdir(download_path):
Image.open(image)
# 2
return response
如果我返回 at #1,则FilesPipeline运行正常并下载文件,但如果我返回 at #2,则返回错误response has no attribute selector