shell - Scrapy response.xpath 不返回任何查询

Question

我正在使用scrapy shell 来提取一些文本数据。以下是我在 scrapy shell 中给出的命令：

>>> scrapy shell "http://jobs.parklandcareers.com/dallas/nursing/jobid6541851-nurse-resident-cardiopulmonary-icu-feb2015-nurse-residency-requires-contract-jobs"

>>> response.xpath('//*[@id="jobDesc"]/span[1]/text()')
[<Selector xpath='//*[@id="jobDesc"]/span[1]/text()' data=u'Dallas, TX'>]
>>> response.xpath('//*[@id="jobDesc"]/span[2]/p/text()[2]')
[<Selector xpath='//*[@id="jobDesc"]/span[2]/p/text()[2]' data=u'Responsible for attending assigned nursi'>]
>>> response.xpath('//*[@id="jobDesc"]/span[2]/p/text()[preceding-sibling::*="Education"][following-sibling::*="Certification"]')
[]

第三个命令不返回任何数据。我试图在命令中的 2 个关键字之间提取数据。我哪里错了？

score 1 · Accepted Answer

//*[@id="jobDesc"]/span[2]/p/text()会返回一个文本节点列表。您可以在 Python 中过滤相关节点。以下是如何获取“教育/经验：”和“认证/注册/执照：”文本段落之间的文本：

>>> result = response.xpath('//*[@id="jobDesc"]/span[2]/p/text()').extract()
>>> start = result.index('Education/Experience:')
>>> end = result.index('Certification/Registration/Licensure:')
>>> print ''.join(result[start+1:end])
- Must be a graduate from an accredited school of Nursing.

UPD（关于评论中的附加问题）：

>>> response.xpath('//*[@id="jobDesc"]/span[3]/text()').re('Job ID: (\d+)')
[u'143112']

score 0 · Accepted Answer

尝试：

substring-before(
  substring-after('//*[@id="jobDesc"]/span[2]/p/text()', 'Education'), 'Certification')

注意：我无法测试它。

这个想法是您不能使用preceding-sibling并且following-sibling因为您查看的是同一个文本节点。您必须提取要使用的文本部分，substring-before()然后substring-after()

通过结合这两个功能，您可以选择介于两者之间的功能。

shell - Scrapy response.xpath 不返回任何查询

2 回答 2

Related

Reference