我正在尝试将此嵌套的 xml 文件解析为数据框。这是xml的示例:
<?xml version='1.0' encoding='UTF-8'?>
<response xmlns="http://www...">
<sensor-time timezone="America/New_York">2020-08-10T12:19:26-04:00</sensor-time>
<status>
<code>OK</code>
</status>
<content>
<elements>
<element>
<element-id>0</element-id>
<element-name>Line 0</element-name>
<sensor-type>SINGLE_SENSOR</sensor-type>
<data-type>LINE</data-type>
<from>2020-08-10T10:00:00-04:00</from>
<to>2020-08-10T12:00:00-04:00</to>
<resolution>FIVE_MINUTES</resolution>
<measurements>
<measurement>
<from>2020-08-10T10:00:00-04:00</from>
<to>2020-08-10T10:05:00-04:00</to>
<values>
<value label="fw">0</value>
<value label="bw">0</value>
</values>
</measurement>
<measurement>
<from>2020-08-10T10:05:00-04:00</from>
<to>2020-08-10T10:10:00-04:00</to>
<values>
<value label="fw">0</value>
<value label="bw">0</value>
</values>
</measurement>
<measurement>
<from>2020-08-10T10:10:00-04:00</from>
<to>2020-08-10T10:15:00-04:00</to>
<values>
<value label="fw">0</value>
<value label="bw">0</value>
</values>
</measurement>
<measurement>
</element>
<element>
<element-id>1</element-id>
<element-name>GP Test CL.01</element-name>
<sensor-type>SINGLE_SENSOR</sensor-type>
<data-type>LINE</data-type>
<from>2020-08-10T10:00:00-04:00</from>
<to>2020-08-10T12:00:00-04:00</to>
<resolution>FIVE_MINUTES</resolution>
<measurements>
<measurement>
<from>2020-08-10T10:00:00-04:00</from>
<to>2020-08-10T10:05:00-04:00</to>
<values>
<value label="fw">0</value>
<value label="bw">0</value>
</values>
</measurement>
<measurement>
<from>2020-08-10T10:05:00-04:00</from>
<to>2020-08-10T10:10:00-04:00</to>
<values>
<value label="fw">0</value>
<value label="bw">0</value>
</values>
</measurement>
<measurement>
<from>2020-08-10T10:10:00-04:00</from>
<to>2020-08-10T10:15:00-04:00</to>
<values>
<value label="fw">0</value>
<value label="bw">0</value>
</values>
</measurement>
<measurement>
</element>
</elements>
</content>
<sensor-info>
<serial-number>D1:82:34:5Z:3Q:3D</serial-number>
<ip-address>000.000.00.0</ip-address>
<name>Demo</name>
<group>Test Devices</group>
<device-type>PC2</device-type>
</sensor-info>
</response>
我尝试使用 xmltodict 库,并且可以提取单个元素,但是由于它是嵌套的并且在 xml 中具有多个元素和测量值,因此当我尝试循环遍历它时,我无法使其正常工作。这是我到目前为止的代码:
import pandas as pd
import glob
import xmltodict
# Look for all xml files in directory
result = []
for file in glob.glob('*.xml'):
with open(file) as fd:
# Load each xml file and append it
doc = xmltodict.parse(fd.read())
for element in doc['response']['content']['elements']['element']:
for m in element['measurements']:
data = {}
for val in m['value']:
data['SERIAL_NUMBER'] = doc['response']['sensor-info']['serial-number']
data['IP'] = doc['response']['sensor-info']['ip-address']
data['name'] = doc['response']['sensor-info']['name']
data['Group'] = doc['response']['sensor-info']['group']
data['Device Type'] = doc['response']['sensor-info']['device-type']
data['element-id'] = element['element-id']
data['Line name'] = element['element-name']
data['From time'] = m['from']
data['to time'] = m['to']
data[val['label']] = val['value']
result.append(data)
df = pd.DataFrame(result)
我得到的错误是:“TypeError:字符串索引必须是整数”在元素中查找 m 的开头。
最终我想要得到的是这种格式的输出:
知道如何让它工作吗?