这是我的简单代码,我没有让它工作。
我从initspider
这是我的代码
class MytestSpider(InitSpider):
name = 'mytest'
allowed_domains = ['example.com']
login_page = 'http://www.example.com'
start_urls = ["http://www.example.com/ist.php"]
def init_request(self):
#"""This function is called before crawling starts."""
return Request(url=self.login_page, callback=self.parse)
def parse(self, response):
item = MyItem()
item['username'] = "mytest"
return item
管道
class TestPipeline(object):
def process_item(self, item, spider):
print item['username']
如果尝试打印该项目,我会收到同样的错误
我得到的错误是
File "crawler/pipelines.py", line 35, in process_item
myitem.username = item['username']
exceptions.TypeError: 'NoneType' object has no attribute '__getitem__'
我的问题是InitSpider
。我的管道没有获取项目对象
项目.py
class MyItem(Item):
username = Field()
设置.py
BOT_NAME = 'crawler'
SPIDER_MODULES = ['spiders']
NEWSPIDER_MODULE = 'spiders'
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.cookies.CookiesMiddleware': 700 # <-
}
COOKIES_ENABLED = True
COOKIES_DEBUG = True
ITEM_PIPELINES = [
'pipelines.TestPipeline',
]
IMAGES_STORE = '/var/www/htmlimages'