你应该检查你的蜘蛛的__init__
方法,如果它不存在,你应该传递用户名和密码。像那样:
class MySpider(BaseSpider):
name = 'myspider'
def __init__(self, username=None, password=None, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
self.start_urls = ['http://www.example.com/']
self.username = username
self.password = password
def start_requests(self):
return [FormRequest("http://www.example.com/login",
formdata={'user': self.username, 'pass': self.password,
callback=self.logged_in)]
def logged_in(self, response):
# here you would extract links to follow and return Requests for
# each of them, with another callback
pass
运行:
scrapy crawl myspider -a username=yourname password=yourpass
代码改编自:http ://doc.scrapy.org/en/0.18/topics/spiders.html
编辑:
您只能拥有一个 Twisted 反应器。但是您可以使用不同的凭据在同一进程中运行多个蜘蛛。运行多个蜘蛛的示例:http: //doc.scrapy.org/en/0.18/topics/practices.html#running-multiple-spiders-in-the-same-process