0

我有一个这样的 HTML 文件:

<div id="note">
 <a name="overview"></a>
 <h3>Overview</h3>
 <p>some text1...</p>
 <a name="description"></a>
 <h3>Description</h3>
 <p>some text2 ...</p>
</div>
                              `

我想检索每个标题的段落。例如,overview: some text1 description: some text 2 ...我想使用 xpath 在 python 中编写它。谢谢你。

4

1 回答 1

0

找到所有h3标签,遍历它们,并在迭代循环的每一步中,找到下一个兄弟标签p

import urllib2
from lxml import etree

URL = "http://www.kb.cert.org/vuls/id/628463"
response = urllib2.urlopen(URL)

parser = etree.HTMLParser()
tree = etree.parse(response, parser)

for header in tree.iter('h3'):
    paragraph = header.xpath('(.//following-sibling::p)[1]')
    if paragraph:
        print "%s: %s" % (header.text, paragraph[0].text)

印刷:

Overview: The Ruby on Rails 3.0 and 2.3 JSON parser contain a vulnerability that may result in arbitrary code execution.
Description: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Impact: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Solution: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Vendor Information : Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
CVSS Metrics : Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
References: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Credit: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Feedback: If you have feedback, comments, or additional information about this vulnerability, please send us 
Subscribe to Updates: Receive security alerts, tips, and other updates.
于 2013-07-31T06:39:25.380 回答