python - 无法在lxml下使用python xpath匹配标签

Question

这是我的代码：

def extractContent(self,html):
    parser = etree.XMLParser(ns_clean=True, recover=True)
    print html.find('id="detail"')
    tree = etree.fromstring(html,parser)
    if tree!=None:
      for c in self.contents:
        m = tree.xpath(c['xpath'])
        print m,c['xpath']
        if len(m) >= 1:
          print c['name'] + ' : ' + m[0].text

我正在尝试匹配 //*[@id="i-detail"]/li[1] html 源代码，但它什么也没显示。

这是上面代码的输出：

25803
[] //*[@id="i-detail"]/li[1]

这是html代码：

<div class="mc fore tabcon">
                    <ul id="i-detail">
                        <li title="XXXXXXXXX">**AAAAAAAAAAA**(what i want to match)</li>
                        <li>BBBBBBBBB</li>
.......

我尝试在命令行下使用 xpath：

>>> root.xpath('//*[@id="i-detail"]/li')
>>> []
>>> root.xpath('//*[@id="i-detail"]/*')
>>> [<Element {http://www.w3.org/1999/xhtml}li at 0x1007b7910>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b79b0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7a50>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7aa0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7af0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7b40>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7b90>]
>>> root.xpath('//*[@id="i-detail"]/*')[0] <----- this line could get the target !

score 0 · Accepted Answer

它似乎对我有用：

>>> s = """<div class="mc fore tabcon">
                    <ul id="i-detail">
                        <li title="XXXXXXXXX">**AAAAAAAAAAA**(what i want to match)</li>
                        <li>BBBBBBBBB</li>
                    </ul>
</div>"""
>>> parser = etree.XMLParser(ns_clean=True, recover=True)
>>> root = etree.fromstring(s, parser)
>>> for node in root.xpath('//*[@id="i-detail"]/li[1]'):
    print node, node.text


<Element li at 0x12534b8> **AAAAAAAAAAA**(what i want to match)

python - 无法在lxml下使用python xpath匹配标签

1 回答 1

Related

Reference