python-2.7 - 在python中使用lxml检索div标签中的文本

Question

我有这个 HTML 代码：

<div class="row">
<span class="label">Source:</span>
08/09/2013
</div>
<div class="row">
<span class="label">Last revised:</span>
08/09/2013
</div>

我想使用如下代码检索发布日期和最后修订日期：

url="http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2013-4031&cid=2"
html=urllib.urlopen(url)
parser=etree.HTMLParser()
tree=etree.parse(html,parser)
root=tree.getroot()

for div in tree.iter('div'):
 title=div.xpath('.//child::*')
 if( title[0].text=="Source:"):
  print (#release date#)

我尝试打印 div.text，但 in 不起作用。我怎样才能做到这一点？我使用 python 2.7 和 lxml。

score 1 · Accepted Answer

它是tail元素的span，而不是的文本div。

for div in tree.iter('div'):
    title = div.xpath('.//child::*')
    if title[0].text == 'Source:':
        print(title[0].tail.strip())

python-2.7 - 在python中使用lxml检索div标签中的文本

1 回答 1

Related

Reference