这是我的代码:
def extractContent(self,html):
parser = etree.XMLParser(ns_clean=True, recover=True)
print html.find('id="detail"')
tree = etree.fromstring(html,parser)
if tree!=None:
for c in self.contents:
m = tree.xpath(c['xpath'])
print m,c['xpath']
if len(m) >= 1:
print c['name'] + ' : ' + m[0].text
我正在尝试匹配 //*[@id="i-detail"]/li[1]
html 源代码,但它什么也没显示。
这是上面代码的输出:
25803
[] //*[@id="i-detail"]/li[1]
这是html代码:
<div class="mc fore tabcon">
<ul id="i-detail">
<li title="XXXXXXXXX">**AAAAAAAAAAA**(what i want to match)</li>
<li>BBBBBBBBB</li>
.......
我尝试在命令行下使用 xpath:
>>> root.xpath('//*[@id="i-detail"]/li')
>>> []
>>> root.xpath('//*[@id="i-detail"]/*')
>>> [<Element {http://www.w3.org/1999/xhtml}li at 0x1007b7910>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b79b0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7a50>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7aa0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7af0>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7b40>, <Element {http://www.w3.org/1999/xhtml}li at 0x1007b7b90>]
>>> root.xpath('//*[@id="i-detail"]/*')[0] <----- this line could get the target !