我是初学者,这个脚本对我有用,但错误地:谁能帮我修复这个代码?这个脚本不提取数字,为什么?
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class NameSpider(BaseSpider):
name = "name"
allowed_domains = `["`example.com/`"]`
start_urls = `[
"http://www.example.com/"
]`
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//td')
for site in sites:
extractcontent = site.select('text()').extract()
print extractcontent
这是什么提取物。
[u'\n\t\t\t\t\t\t', u'\n\t\t\t\t\t\t']
[u' ', u' \n\t\t\t\t\t\t\t\t Text']
[u'Text ']
[u'Text ']
[u'\n\t\t\t\t\t\t\t\tText ']
[u' - ']
[u'\n\t\t\t\t\t\t\t\tText ']
[]
[u'\n\t\t\t\t\t\t', u'\n\t\t\t\t\t\t']
[u' ', u' \n\t\t\t\t\t\t\t\t Text']
[u'Text ']
[u'Text ']
[u'\n\t\t\t\t\t\t\t\tText ']
[u' ', u'\n\t\t\t\t\t\t\t\t ']
[u'\n\t\t\t\t\t\t\t\tText ']
[]
[u'Text ']
[u'\n\t\t\t\t\t\t\t\tText ']
[u' ', u'\n\t\t\t\t\t\t\t\t ']
[u'\n\t\t\t\t\t\t\t\tText ']
提前感谢您的帮助!