0

我有一个似乎无法弄清楚的 python XML 解析问题。

我有以下 XML:

<data>
  <data_in base="base64">
  </data_in>
  <log_sense_data>
    <ds base="bool">1</ds>
    <spf base="bool">0</spf>
    <page_code base="hex">15</page_code>
    <background_scan_results_log_page>
      <parameter>
        <parameter_code base="hex">0000</parameter_code>
        <du base="bool">0</du>
        <tsd base="bool">0</tsd>
        <etc base="bool">0</etc>
        <tmc base="hex">00</tmc>
        <format_linking base="hex">03</format_linking>
        <parameter_length base="dec">12</parameter_length>
        <description base="string">background scanning status parameter</description>
        <accumulated_power_on_minutes base="dec">579578</accumulated_power_on_minutes>
        <background_scanning_status base="hex">01</background_scanning_status>
        <number_of_background_scans_performed base="dec">112</number_of_background_scans_performed>
        <background_scan_progress base="hex">00000036</background_scan_progress>
        <number_of_background_medium_scans_performed base="dec">112</number_of_background_medium_scans_performed>
      </parameter>
      <parameter>
        <parameter_code base="hex">0001</parameter_code>
        <du base="bool">0</du>
        <tsd base="bool">0</tsd>
        <etc base="bool">0</etc>
        <tmc base="hex">00</tmc>
        <format_linking base="hex">03</format_linking>
        <parameter_length base="dec">20</parameter_length>
        <description base="string">background medium scan parameter</description>
        <accumulated_power_on_minutes base="dec">82932</accumulated_power_on_minutes>
        <reassign_status base="hex">05</reassign_status>
        <sense_key base="hex">01</sense_key>
        <additional_sense_code base="hex">17</additional_sense_code>
        <additional_sense_code_qualifier base="hex">01</additional_sense_code_qualifier>
        <vendor_specific base="hex">20e2570187</vendor_specific>
        <logical_block_address base="hex">00000000478994d8</logical_block_address>
      </parameter>
      <parameter>
        <parameter_code base="hex">0002</parameter_code>
        <du base="bool">0</du>
        <tsd base="bool">0</tsd>
        <etc base="bool">0</etc>
        <tmc base="hex">00</tmc>
        <format_linking base="hex">03</format_linking>
        <parameter_length base="dec">20</parameter_length>
        <description base="string">background medium scan parameter</description>
        <accumulated_power_on_minutes base="dec">104467</accumulated_power_on_minutes>
        <reassign_status base="hex">05</reassign_status>
        <sense_key base="hex">01</sense_key>
        <additional_sense_code base="hex">18</additional_sense_code>
        <additional_sense_code_qualifier base="hex">07</additional_sense_code_qualifier>
        <vendor_specific base="hex">203ab846ea</vendor_specific>
        <logical_block_address base="hex">00000000133d5046</logical_block_address>
      </parameter>
    </background_scan_results_log_page>
  </log_sense_data>
</data>

其中 Parameter_code 0000 将始终存在,之后可能有任意数量的 parameter_code。本质上,我想从 parameter_code 0000 中提取 2 个值(开机时间、后台扫描),以及 parameter_code 0001 和更大的大多数值,以便稍后放入数据库。我到目前为止的代码是这样的:

import xml.etree.ElementTree as et
log_page_tree = et.fromstring(results['Data']['RawData'])
if log_page_tree.find('log_sense_data') == None:
        continue
    else:
        for element in log_page_tree.find('log_sense_data'):
            for pagecode in element.iter('page_code'):
                if pagecode.text == '15':
                    for param in log_page_tree.find('log_sense_data').find('background_scan_results_log_page'):
                        for derp in param.iter():
                            print derp.tag, derp.text
                #for totalpoweron in param.iter('accumulated_power_on_minutes'):
                                    #print totalpoweron.text

我希望能够保留 parameter_code 0000 中的 2 个值,同时遍历要放入数据库的其余 parameter_code。谁能在这里推动我朝着正确的方向前进?如果我指定 param.iter('somevalue') 来获取每个值,代码似乎不会迭代。

4

1 回答 1

0

好的,虽然有一些方法可以简化/改进你的代码,但听起来你很高兴在这里:

for param in log_page_tree.find('log_sense_data').find('background_scan_results_log_page'):

这实际上会遍历每个parameter.

但是现在你想打开是否parameter_code0000,在每种情况下做不同的事情。所以:

converters = {
    'hex': lambda s: int(s, 16)
    'dec': int,
    'bool': bool
}

if param.find('parameter_code').text == '0000':
    accumulated_power_on_minutes = int(param.find('accumulated_power_on_minutes').text)
    number_of_background_scans_performed = int(param.find('number_of_background_scans_performed').text)
else:
    obj = {}
    for elem in param.getchildren():
        name = elem.tag
        base = elem.attrib['base']
        converter = converters.get(base, lambda x: x)
        value = convert(elem.text)
        obj[name] = value
    # do something with obj
于 2013-05-30T00:53:31.747 回答