python - 使用 xml.etree.ElementTree 在 python 中的 XML 解析问题

Question

我确实有一些 http 响应生成的以下 xml

<?xml version="1.0" encoding="UTF-8"?>
<Response rid="1000" status="succeeded" moreData="false">
  <Results completed="true" total="25" matched="5" processed="25">
      <Resource type="h" DisplayName="Host" name="tango">
          <Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
             <PerfData attrId="cpuUsage" attrName="Usage">
                <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="36.00"/>
                <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="86.00"/>
                <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="29.00"/>
             </PerfData>
          <Resource type="vm" DisplayName="VM" name="charlie" baseHost="tango">
              <Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
              <PerfData attrId="cpuUsage" attrName="Usage">
                 <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="6.00"/>
              </PerfData>
          </Resource>
      </Resource>
  </Result>
</Response>

如果你仔细看一下 - Outer 里面还有一个相同的标签

所以高级xml结构如下

<Resource>
    <Resource>
    </Resource>
</Resource>

Python ElementTree 只能解析外部 xml ......下面是我的代码

pattern = re.compile(r'(<Response.*?</Response>)',
                     re.VERBOSE | re.MULTILINE)

for match in pattern.finditer(data):
    contents = match.group(1)
    responses = xml.fromstring(contents)

    for results in responses:
        result = results.tag

        for resources in results:
            resource = resources.tag
            temp = {}
            temp = resources.attrib
            print temp

这显示了以下输出（温度）

{'typeDisplayName': 'Host', 'type': 'h', 'name': 'tango'}

如何获取内部属性？

score 2 · Accepted Answer

不要用正则表达式解析 xml！那行不通，请改用一些xml解析库，例如lxml：

编辑：代码示例现在只获取顶级资源，循环它们并尝试获取“子资源”，这是在评论中的 OP 请求之后进行的

from lxml import etree

content = '''
YOUR XML HERE
'''

root = etree.fromstring(content)

# search for all "top level" resources
resources = root.xpath("//Resource[not(ancestor::Resource)]")
for resource in resources:
    # copy resource attributes in a dict
    mashup = dict(resource.attrib)
    # find child resource elements
    subresources = resource.xpath("./Resource")
    # if we find only one resource, add it to the mashup
    if len(subresources) == 1:
        mashup['resource'] = dict(subresources[0].attrib)
    # else... not idea what the OP wants...

    print mashup

这将输出：

{'resource': {'DisplayName': 'VM', 'type': 'vm', 'name': 'charlie', 'baseHost': 'tango'}, 'DisplayName': 'Host', 'type': 'h', 'name': 'tango'}

python - 使用 xml.etree.ElementTree 在 python 中的 XML 解析问题

1 回答 1

Related

Reference