我需要抓取一个 xml 页面http://www.10why.net/sitemap.xml 这只是我想要的 url 表
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
import re
thename = "sitemap"
class ReviewSpider(BaseSpider):
name = thename
allowed_domains = ['10why.net']
start_urls = ['http://www.10why.net/sitemap.xml']
def parse(self, response):
hxs = HtmlXPathSelector(response)
content = hxs.select('//table[@cellpadding="5"]/tbody//a')
print content
for c in content:
file = open('%s.txt' % thename, 'a')
file.write("\n")
file.write(c)
file.close()
打印的内容是 [] (空列表)我用来能够在普通的 html 页面而不是站点地图 xml 页面上爬取东西。请帮我。PS:我自己写的文件有其他原因。