python - 如何使用 python 3.2 提取与 XML 文件属性关联的数据

Question

我有这种xml格式......

<event timestamp="0.447463" bustype="LIN" channel="LIN 1">  
 <col name="Time"/>  
 <col name="Start of Frame">0.440708</col>  
 <col name="Channel">LIN 1</col>  
 <col name="Dir">Tx</col>  
 <col name="Event Type">LIN Frame (Diagnostic Request)</col>  
 <col name="Frame Name">MasterReq_DB</col>  
 <col name="Id">3C</col>  
 <col name="Data">81 06 04 04 FF FF 50 4C</col>  
 <col name="Publisher">TestMaster (simulated)</col>  
 <col name="Checksum">D3 &quot;Classic&quot;</col>  
 <col name="Header Duration">2.090 ms (40.1 bits)</col>  
 <col name="Resp. Duration">4.688 ms (90.0 bits)</col>  
 <col name="Time difference">0.049987</col>  
 <empty/>  
</event>

在上面的 xml 中，我需要提取与属性“名称”相关的数据
能够获取所有名称但无法获取 >MasterReq_DB< 字段
请帮助我...
提前致谢

我的python代码是...

import sys 
import array
import string
from xml.dom.minidom import parse,parseString
from xml.dom import minidom                                              
input_file = open("test_input.txt",'r')                                                
alines = input_file.read()
word_lst = alines.split("'")
filename = word_lst[1]
pathname=word_lst[3]                                               
f = open(pathname,'r')
doc = minidom.parse(f)
node = doc.documentElement
events = doc.getElementsByTagName('event')
for event in events:
    #print (event)
    columns =  event.getElementsByTagName('col')
    for column in columns:
        #print (column)
        head = column.getAttribute('name')
        if (head == ('Frame Name')):
           print (head)
           request = head.firstChild.wholeText
           print (request)
print ("DOne")

score 1 · Accepted Answer

lxml如果您愿意，这里有一个入门指南，可以帮助您入门：

In [1]: x = '''<event timestamp="0.447463" bustype="LIN" channel="LIN 1">  
   ...:  <col name="Time"/>  
   ...:  <col name="Start of Frame">0.440708</col>  
   ...:  <col name="Channel">LIN 1</col>  
   ...:  <col name="Dir">Tx</col>  
   ...:  <col name="Event Type">LIN Frame (Diagnostic Request)</col>  
   ...:  <col name="Frame Name">MasterReq_DB</col>  
   ...:  <col name="Id">3C</col>  
   ...:  <col name="Data">81 06 04 04 FF FF 50 4C</col>  
   ...:  <col name="Publisher">TestMaster (simulated)</col>  
   ...:  <col name="Checksum">D3 &quot;Classic&quot;</col>  
   ...:  <col name="Header Duration">2.090 ms (40.1 bits)</col>  
   ...:  <col name="Resp. Duration">4.688 ms (90.0 bits)</col>  
   ...:  <col name="Time difference">0.049987</col>  
   ...:  <empty/>  
   ...: </event> '''

In [2]: from lxml import etree

In [3]: tree = etree.fromstring(x)

In [4]: [elem.text for elem in tree.xpath('//*[@name]')]
Out[4]: 
[None,
 '0.440708',
 'LIN 1',
 'Tx',
 'LIN Frame (Diagnostic Request)',
 'MasterReq_DB',
 '3C',
 '81 06 04 04 FF FF 50 4C',
 'TestMaster (simulated)',
 'D3 "Classic"',
 '2.090 ms (40.1 bits)',
 '4.688 ms (90.0 bits)',
 '0.049987']

In [5]: [name for name in tree.xpath('//@name')]
Out[5]: 
['Time',
 'Start of Frame',
 'Channel',
 'Dir',
 'Event Type',
 'Frame Name',
 'Id',
 'Data',
 'Publisher',
 'Checksum',
 'Header Duration',
 'Resp. Duration',
 'Time difference']

要从文件而不是字符串中读取，请使用lxml.etree.parse函数。

这是lxml教程的链接。这是XPath 语法的参考。

python - 如何使用 python 3.2 提取与 XML 文件属性关联的数据

1 回答 1

Related

Reference