2

我正在尝试使用带有 lxml 和 xpath 的 python 解析值表单 html。

这是我的html数据

<table>
<tr>
<td class="u"><input class="wide" name="record[13][name]" value="exampledomain1.com"></td>
      <td class="u">
       <select name="record[13][type]">
         <option SELECTED value="A" >A</option>
         <option value="AAAA" >AAAA</option>
         <option value="CNAME" >CNAME</option>
         <option value="HINFO" >HINFO</option>
         <option value="MX" >MX</option>
         <option value="NAPTR" >NAPTR</option>
         <option value="NS" >NS</option>
         <option value="PTR" >PTR</option>
         <option value="SOA" >SOA</option>
         <option value="SPF" >SPF</option>
         <option value="SRV" >SRV</option>
         <option value="SSHFP" >SSHFP</option>
         <option value="TXT" >TXT</option>
         <option value="RP" >RP</option>
         <option value="URL" >URL</option>
         <option value="MBOXFW" >MBOXFW</option>
         <option value="CURL" >CURL</option>
       </select>
      </td>
      <td class="u"><input class="wide" name="record[13][content]" value='10.10.10.1'></td>

<td class="u"><input class="wide" name="record[14][name]" value="exampledomain2.com"></td>
      <td class="u">
       <select name="record[14][type]">
         <option SELECTED value="CNAME" >A</option>
         <option value="AAAA" >AAAA</option>
         <option value="CNAME" >CNAME</option>
         <option value="HINFO" >HINFO</option>
         <option value="MX" >MX</option>
         <option value="NAPTR" >NAPTR</option>
         <option value="NS" >NS</option>
         <option value="PTR" >PTR</option>
         <option value="SOA" >SOA</option>
         <option value="SPF" >SPF</option>
         <option value="SRV" >SRV</option>
         <option value="SSHFP" >SSHFP</option>
         <option value="TXT" >TXT</option>
         <option value="RP" >RP</option>
         <option value="URL" >URL</option>
         <option value="MBOXFW" >MBOXFW</option>
         <option value="CURL" >CURL</option>
       </select>
      </td>
      <td class="u"><input class="wide" name="record[14][content]" value='exampledomain1.com'></td>

<td class="u"><input class="wide" name="record[15][name]" value="exampledomain3.com"></td>
      <td class="u">
       <select name="record[15][type]">
         <option SELECTED value="A" >A</option>
         <option value="AAAA" >AAAA</option>
         <option value="CNAME" >CNAME</option>
         <option value="HINFO" >HINFO</option>
         <option value="MX" >MX</option>
         <option value="NAPTR" >NAPTR</option>
         <option value="NS" >NS</option>
         <option value="PTR" >PTR</option>
         <option value="SOA" >SOA</option>
         <option value="SPF" >SPF</option>
         <option value="SRV" >SRV</option>
         <option value="SSHFP" >SSHFP</option>
         <option value="TXT" >TXT</option>
         <option value="RP" >RP</option>
         <option value="URL" >URL</option>
         <option value="MBOXFW" >MBOXFW</option>
         <option value="CURL" >CURL</option>
       </select>
      </td>
      <td class="u"><input class="wide" name="record[15][content]" value='10.10.10.3'></td>
</tr>
</table>

我想要的是解析值并打印如下:

exampledomain1.com A 10.10.10.1
exampledomain2.com CNAME exampledomain1.com
exampledomain3.com A 10.10.10.3

这是我尝试过的

#!/usr/bin/python
import lxml.html
from lxml import etree

doc = lxml.html.document_fromstring("""Here whole html data""")
txt1 = doc.xpath('//*[@class="wide"]/@value')
txt2 = doc.xpath('//@SELECTED/text()')
print txt1
print txt2

但它没有按我的意愿工作。任何帮助,将不胜感激。

谢谢你们。

4

2 回答 2

3

我修复了代码以返回以下内容,这与您的要求非常接近:

(py26_default)[mpenning@Bucksnort ~]$ python parse.py
exampledomain1.com 10.10.10.1
exampledomain2.com exampledomain1.com
exampledomain3.com 10.10.10.3
(py26_default)[mpenning@Bucksnort ~]$

您无法record[13][type]使用 xpath 进行检索...还有其他方法可以遍历它,但我将把它作为 OP 的练习。请注意,我确实修复了 OP 问题中的 HTML 以包含<table><tr>标记...

import lxml.html
from lxml import etree
from lxml.etree import XMLParser

parser = XMLParser(ns_clean=True, recover=True)
doc = etree.fromstring("""Here whole html data""", parser)
elem1 = doc.xpath('//input[@name="record[13][name]"]')
# NOTE: <option SELECTED> cannot be retrieved with xpath... SELECTED must have
#   a value to do so...
#elem2 = doc.xpath('//select[@name="record[13][type]"]/option[@SELECTED]')
elem3 = doc.xpath('//input[@name="record[13][content]"]')

for idx, val in enumerate(elem1):
    print val.attrib['value'], elem3[idx].attrib['value']

<!-- The (fixed) html source I used -->
<table>
<tr>
<td class="u"><input class="wide" name="record[13][name]" value="exampledomain1.com"></td>
      <td class="u">
       <select name="record[13][type]">
         <option SELECTED value="A" >A</option>
         <option value="AAAA" >AAAA</option>
         <option value="CNAME" >CNAME</option>
         <option value="HINFO" >HINFO</option>
         <option value="MX" >MX</option>
         <option value="NAPTR" >NAPTR</option>
         <option value="NS" >NS</option>
         <option value="PTR" >PTR</option>
         <option value="SOA" >SOA</option>
         <option value="SPF" >SPF</option>
         <option value="SRV" >SRV</option>
         <option value="SSHFP" >SSHFP</option>
         <option value="TXT" >TXT</option>
         <option value="RP" >RP</option>
         <option value="URL" >URL</option>
         <option value="MBOXFW" >MBOXFW</option>
         <option value="CURL" >CURL</option>
       </select>
      </td>
      <td class="u"><input class="wide" name="record[13][content]" value='10.10.10.1'></td>

<td class="u"><input class="wide" name="record[13][name]" value="exampledomain2.com"></td>
      <td class="u">
       <select name="record[13][type]">
         <option SELECTED value="CNAME" >A</option>
         <option value="AAAA" >AAAA</option>
         <option value="CNAME" >CNAME</option>
         <option value="HINFO" >HINFO</option>
         <option value="MX" >MX</option>
         <option value="NAPTR" >NAPTR</option>
         <option value="NS" >NS</option>
         <option value="PTR" >PTR</option>
         <option value="SOA" >SOA</option>
         <option value="SPF" >SPF</option>
         <option value="SRV" >SRV</option>
         <option value="SSHFP" >SSHFP</option>
         <option value="TXT" >TXT</option>
         <option value="RP" >RP</option>
         <option value="URL" >URL</option>
         <option value="MBOXFW" >MBOXFW</option>
         <option value="CURL" >CURL</option>
       </select>
      </td>
      <td class="u"><input class="wide" name="record[13][content]" value='exampledomain1.com'></td>

<td class="u"><input class="wide" name="record[13][name]" value="exampledomain3.com"></td>
      <td class="u">
       <select name="record[13][type]">
         <option SELECTED value="A" >A</option>
         <option value="AAAA" >AAAA</option>
         <option value="CNAME" >CNAME</option>
         <option value="HINFO" >HINFO</option>
         <option value="MX" >MX</option>
         <option value="NAPTR" >NAPTR</option>
         <option value="NS" >NS</option>
         <option value="PTR" >PTR</option>
         <option value="SOA" >SOA</option>
         <option value="SPF" >SPF</option>
         <option value="SRV" >SRV</option>
         <option value="SSHFP" >SSHFP</option>
         <option value="TXT" >TXT</option>
         <option value="RP" >RP</option>
         <option value="URL" >URL</option>
         <option value="MBOXFW" >MBOXFW</option>
         <option value="CURL" >CURL</option>
       </select>
      </td>
      <td class="u"><input class="wide" name="record[13][content]" value='10.10.10.3'></td>
</tr>
</table>
于 2012-07-31T19:49:27.613 回答
0
record_13_name = tree.xpath("//select[@name='record[13][name]']/text()")
record_13_type = tree.xpath("//select[@name='record[13][type]']/option/text()")
record_13_content = tree.xpath("//input[@name='record[13][content]']/text()")


record_14_name = tree.xpath("//select[@name='record[14][name]']/text()")
record_14_type = tree.xpath("//select[@name='record[14][type]']/option/text()")
record_14_content = tree.xpath("//input[@name='record[14][content]']/text()")


record_15_name = tree.xpath("//select[@name='record[15][name]']/text()")
record_15_type = tree.xpath("//select[@name='record[15][type]']/option/text()")
record_15_content = tree.xpath("//input[@name='record[15][content]']/text()") 
于 2015-01-14T05:21:01.450 回答