我在scrapy,python中使用站点地图蜘蛛。站点地图似乎有不寻常的格式,网址前有“//”:
<url>
<loc>//www.example.com/10/20-baby-names</loc>
</url>
<url>
<loc>//www.example.com/elizabeth/christmas</loc>
</url>
myspider.py
from scrapy.contrib.spiders import SitemapSpider
from myspider.items import *
class MySpider(SitemapSpider):
name = "myspider"
sitemap_urls = ["http://www.example.com/robots.txt"]
def parse(self, response):
item = PostItem()
item['url'] = response.url
item['title'] = response.xpath('//title/text()').extract()
return item
我收到此错误:
raise ValueError('Missing scheme in request url: %s' % self._url)
exceptions.ValueError: Missing scheme in request url: //www.example.com/10/20-baby-names
如何使用站点地图蜘蛛手动解析 url?