0

这是我的代码:

def extractContent(self,html):
    parser = etree.XMLParser(ns_clean=True, recover=True)
    print html.find('id="detail"')
    tree = etree.fromstring(html,parser)
    if tree!=None:
      for c in self.contents:
        m = tree.xpath(c['xpath'])
        print m,c['xpath']
        if len(m) >= 1:
          print c['name'] + ' : ' + m[0].text

我正在尝试匹配 //*[@id="i-detail"]/li[1] html 源代码,但它什么也没显示。

这是上面代码的输出:

25803
[] //*[@id="i-detail"]/li[1]

这是html代码:

<div class="mc fore tabcon">
                    <ul id="i-detail">
                        <li title="XXXXXXXXX">**AAAAAAAAAAA**(what i want to match)</li>
                        <li>BBBBBBBBB</li>
.......

我尝试在命令行下使用 xpath:

>>> root.xpath('//*[@id="i-detail"]/li')
>>> []
>>> root.xpath('//*[@id="i-detail"]/*')
>>> [<Element {http://www.w3.org/1999/xhtml}li at 0x1007b7910>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b79b0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7a50>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7aa0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7af0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7b40>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7b90>]
>>> root.xpath('//*[@id="i-detail"]/*')[0] <----- this line could get the target !
4

1 回答 1

0

它似乎对我有用:

>>> s = """<div class="mc fore tabcon">
                    <ul id="i-detail">
                        <li title="XXXXXXXXX">**AAAAAAAAAAA**(what i want to match)</li>
                        <li>BBBBBBBBB</li>
                    </ul>
</div>"""
>>> parser = etree.XMLParser(ns_clean=True, recover=True)
>>> root = etree.fromstring(s, parser)
>>> for node in root.xpath('//*[@id="i-detail"]/li[1]'):
    print node, node.text


<Element li at 0x12534b8> **AAAAAAAAAAA**(what i want to match)
于 2012-07-11T08:38:29.817 回答