python - 使用 ElementTree 示例在 Python 中解析 XML

Question

我很难找到一个很好的基本示例，说明如何使用元素树在 python 中解析 XML。据我所知，这似乎是用于解析 XML 的最简单的库。这是我正在使用的 XML 示例：

<timeSeriesResponse>
    <queryInfo>
        <locationParam>01474500</locationParam>
        <variableParam>99988</variableParam>
        <timeParam>
            <beginDateTime>2009-09-24T15:15:55.271</beginDateTime>
            <endDateTime>2009-11-23T15:15:55.271</endDateTime>
        </timeParam>
     </queryInfo>
     <timeSeries name="NWIS Time Series Instantaneous Values">
         <values count="2876">
            <value dateTime="2009-09-24T15:30:00.000-04:00" qualifiers="P">550</value>
            <value dateTime="2009-09-24T16:00:00.000-04:00" qualifiers="P">419</value>
            <value dateTime="2009-09-24T16:30:00.000-04:00" qualifiers="P">370</value>
            .....
         </values>
     </timeSeries>
</timeSeriesResponse>

我能够使用硬编码的方法做我需要的事情。但我需要我的代码更有活力。这是有效的：

tree = ET.parse(sample.xml)
doc = tree.getroot()

timeseries =  doc[1]
values = timeseries[2]

print child.attrib['dateTime'], child.text
#prints 2009-09-24T15:30:00.000-04:00, 550

这是我尝试过的几件事，但都没有奏效，报告说他们找不到 timeSeries （或我尝试过的其他任何东西）：

tree = ET.parse(sample.xml)
tree.find('timeSeries')

tree = ET.parse(sample.xml)
doc = tree.getroot()
doc.find('timeSeries')

基本上，我想加载xml文件，搜索timeSeries标签，遍历value标签，返回dateTime和标签本身的值；我在上面的例子中所做的一切，但没有对我感兴趣的 xml 部分进行硬编码。谁能指出一些例子，或者给我一些关于如何解决这个问题的建议？

感谢所有的帮助。使用以下两个建议都适用于我提供的示例文件，但是，它们不适用于完整文件。这是我在使用 Ed Carrel 的方法时从真实文件中得到的错误：

 (<type 'exceptions.AttributeError'>, AttributeError("'NoneType' object has no attribute 'attrib'",), <traceback object at 0x011EFB70>)

我认为它不喜欢的真实文件中有一些东西，所以我逐渐删除了一些东西，直到它起作用。以下是我更改的行：

originally: <timeSeriesResponse xsi:schemaLocation="a URL I removed" xmlns="a URL I removed" xmlns:xsi="a URL I removed">
 changed to: <timeSeriesResponse>

 originally:  <sourceInfo xsi:type="SiteInfoType">
 changed to: <sourceInfo>

 originally: <geogLocation xsi:type="LatLonPointType" srs="EPSG:4326">
 changed to: <geogLocation>

删除具有 'xsi:...' 的属性解决了这个问题。'xsi:...' 不是有效的 XML 吗？我很难以编程方式删除这些。有什么建议的解决方法吗？

这是完整的 XML 文件：http ://www.sendspace.com/file/lofcpt

当我最初问这个问题时，我不知道 XML 中的名称空间。现在我知道发生了什么，我不必删除作为命名空间声明的“xsi”属性。我只是将它们包含在我的 xpath 搜索中。有关 lxml 中命名空间的更多信息，请参阅此页面。

score 46 · Accepted Answer

所以我现在在我的盒子上安装了 ElementTree 1.2.6，并针对您发布的 XML 块运行以下代码：

import elementtree.ElementTree as ET

tree = ET.parse("test.xml")
doc = tree.getroot()
thingy = doc.find('timeSeries')

print thingy.attrib

并得到以下回复：

{'name': 'NWIS Time Series Instantaneous Values'}

它似乎已经找到了 timeSeries 元素，而无需使用数字索引。

现在有用的是当你说“它不起作用”时知道你的意思。由于它在给定相同输入的情况下对我有用，因此 ElementTree 不太可能以某种明显的方式被破坏。使用任何错误消息、回溯或您可以提供的任何内容来更新您的问题，以帮助我们帮助您。

score 22 · Accepted Answer

如果我正确理解您的问题：

for elem in doc.findall('timeSeries/values/value'):
    print elem.get('dateTime'), elem.text

或者如果您愿意（并且如果只出现一次timeSeries/values：

values = doc.find('timeSeries/values')
for value in values:
    print value.get('dateTime'), elem.text

该findall()方法返回所有匹配元素的列表，而find()仅返回第一个匹配元素。第一个示例循环遍历所有找到的元素，第二个示例循环遍历元素的子元素values，在这种情况下导致相同的结果。

但是，我看不出找不到问题的timeSeries原因。也许你只是忘记了getroot()电话？（请注意，您实际上并不需要它，因为您也可以从 elementtree 本身工作，如果您将路径表达式更改为 example/timeSeriesResponse/timeSeries/values或//timeSeries/values）

python - 使用 ElementTree 示例在 Python 中解析 XML

2 回答 2

Related

Reference