python - 在 Python 中读取 XML 文件并获取其属性值

Question

我有这个 XML 文件：

<domain type='kmc' id='007'>
  <name>virtual bug</name>
  <uuid>66523dfdf555dfd</uuid>
  <os>
    <type arch='xintel' machine='ubuntu'>hvm</type>
    <boot dev='hd'/>
    <boot dev='cdrom'/>
  </os>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>270336</currentMemory>
  <vcpu placement='static'>10</vcpu>

现在，我想解析它并获取它的属性值。例如，我想获取该uuid字段。那么在 Python 中获取它的正确方法应该是什么？

score 28 · Accepted Answer

这是一个提取属性和元素文本的lxml片段（您的问题对于您需要哪个有点模棱两可，所以我将两者都包括在内）：

from lxml import etree
doc = etree.parse(filename)

memoryElem = doc.find('memory')
print memoryElem.text        # element text
print memoryElem.get('unit') # attribute

您问（在对 Ali Afshar 的回答的评论中）minidom（2.x，3.x）是否是一个不错的选择。这是使用 minidom 的等效代码；自己判断哪个更好：

import xml.dom.minidom as minidom
doc = minidom.parse(filename)

memoryElem = doc.getElementsByTagName('memory')[0]
print ''.join( [node.data for node in memoryElem.childNodes] )
print memoryElem.getAttribute('unit')

lxml 对我来说似乎是赢家。

score 13 · Accepted Answer

XML

<data>
    <items>
        <item name="item1">item1</item>
        <item name="item2">item2</item>
        <item name="item3">item3</item>
        <item name="item4">item4</item>
    </items>
</data>

Python ：

from xml.dom import minidom
xmldoc = minidom.parse('items.xml')
itemlist = xmldoc.getElementsByTagName('item') 
print "Len : ", len(itemlist)
print "Attribute Name : ", itemlist[0].attributes['name'].value
print "Text : ", itemlist[0].firstChild.nodeValue
for s in itemlist :
    print "Attribute Name : ", s.attributes['name'].value
    print "Text : ", s.firstChild.nodeValue

score 2 · Accepted Answer

2

etree，与lxml可能：

root = etree.XML(MY_XML)
uuid = root.find('uuid')
print uuid.text

于 2012-09-05T21:34:50.163 回答

score 0 · Accepted Answer

0

我会使用 lxml 并使用解析它xpath //UUID

于 2012-09-05T21:35:42.047 回答

score 0 · Accepted Answer

其他人可以告诉你如何使用 Python 标准库来做到这一点。我会推荐我自己的迷你图书馆，这让这一切变得非常简单。

>>> obj = xml2obj.xml2obj("""<domain type='kmc' id='007'>
... <name>virtual bug</name>
... <uuid>66523dfdf555dfd</uuid>
... <os>
... <type arch='xintel' machine='ubuntu'>hvm</type>
... <boot dev='hd'/>
... <boot dev='cdrom'/>
... </os>
... <memory unit='KiB'>524288</memory>
... <currentMemory unit='KiB'>270336</currentMemory>
... <vcpu placement='static'>10</vcpu>
... </domain>""")
>>> obj.uuid
u'66523dfdf555dfd'

http://code.activestate.com/recipes/534109-xml-to-python-data-structure/

score 0 · Accepted Answer

上面的 XML 没有结束标记，它会给出

etree 解析错误：标签中的数据过早结束

正确的 XML 是：

<domain type='kmc' id='007'>
  <name>virtual bug</name>
  <uuid>66523dfdf555dfd</uuid>
  <os>
    <type arch='xintel' machine='ubuntu'>hvm</type>
    <boot dev='hd'/>
    <boot dev='cdrom'/>
  </os>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>270336</currentMemory>
  <vcpu placement='static'>10</vcpu>
</domain>

score 0 · Accepted Answer

您可以尝试使用 (recover=True) 解析它。你可以做这样的事情。

parser = etree.XMLParser(recover=True)
tree = etree.parse('your xml file', parser)

我最近使用了这个，它对我有用，你可以试试看，但如果你需要做任何更复杂的 xml 数据提取，你可以看看我为一些处理复杂 xml 数据提取的项目编写的代码。

python - 在 Python 中读取 XML 文件并获取其属性值

7 回答 7

Related

Reference