python - 是否有在 Python 中“遍历”XML 的正式方法？

Question

我一直在学习如何使用 dom.minidom 函数提取部分 XML，并且可以成功返回特定元素和属性。

我有许多要解析的大型 XML 文件，并将所有结果推送到数据库中。是否有像 os.walk 这样的函数，我可以使用它并以保留层次结构的逻辑方式从 XML 中提取元素？

XML 非常基础，非常简单：

<InternalSignature ID="9" Specificity="Generic">
 <ByteSequence Reference="BOFoffset">
  <SubSequence Position="1" SubSeqMinOffset="0" SubSeqMaxOffset="0" MinFragLength="0">
  <Sequence>49492A00</Sequence> 
  <DefaultShift>5</DefaultShift> 
  <Shift Byte="00">1</Shift> 
  <Shift Byte="2A">2</Shift> 
  <Shift Byte="49">3</Shift> 
  </SubSequence>
 </ByteSequence>
</InternalSignature>
<InternalSignature ID="10" Specificity="Generic">
 <ByteSequence Reference="BOFoffset">
  <SubSequence Position="1" SubSeqMinOffset="0" SubSeqMaxOffset="0" MinFragLength="0">
  <Sequence>4D4D002A</Sequence> 
  <DefaultShift>5</DefaultShift> 
  <Shift Byte="2A">1</Shift> 
  <Shift Byte="00">2</Shift> 
  <Shift Byte="4D">3</Shift> 
  </SubSequence>
 </ByteSequence>
</InternalSignature>

是否有一种正式的方法来抓取 XML 并（在这个小示例中）提取与每个特定 InternalSignature 相关的元素？我可以看到如何使用 minidom.parse 和 .GetElementsByName 方法通过列表调用事物，但我不确定如何将元素关联到它们的层次表示中。

到目前为止，我找到了一个教程，展示了如何返回各种值：

xmldoc = minidom.parse("file.xml")
Versionlist = xmldoc.getElementsByTagName('FFSignatureFile')
VersionRef = Versionlist[0]
Version = VersionRef.attributes["Version"]
DateCreated = VersionRef.attributes["DateCreated"]
print Version.value
print DateCreated.value
InternalSignatureList = xmldoc.getElementsByTagName('InternalSignature')
InternalSignatureRef = InternalSignatureList[0]
SigID = InternalSignatureRef.attributes["ID"]
SigSpecificity = InternalSignatureRef.attributes["Specificity"]
print SigID.value 
print SigSpecificity.value
print len(InternalSignatureList)

我可以从最后一行 (len) 看到 InternalSignatureList 中有 134 个元素，本质上我希望能够将每个 InternalSignature 中的所有元素提取为单独的记录并将其轻弹到数据库中。

score 3 · Accepted Answer

（你试过什么？）

from xml.etree import ElementTree

e = ElementTree.fromstring(xmlstring)
e.findall("ByteSequence")

python - 是否有在 Python 中“遍历”XML 的正式方法？

1 回答 1

Related

Reference