from scrapy.spider import BaseSpider
class dmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2]
open(filename, 'wb').write(response.body)
然后我运行“scrapy crawl dmoz”然后我得到了这个错误:
2013-09-14 13:20:56+0700 [dmoz] 调试:重试 http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/>(失败 1 次):连接到另一端以不干净的方式迷失。
有谁知道如何解决这一问题?