我正在尝试从 XML 中提取一些数据。我正在使用xmltodict将数据加载到字典中,然后使用列表推导将各个部分提取到单独的列表中。稍后我将使用 matplotlib 绘制这些图。
XML:
<?xml version="1.0" ?>
<MYDATA>
<SESSION ID="1234">
<INFO>
<BEGIN LOAD="23"/>
</INFO>
<TRANSACTION ID="2103645570">
<ANSWER>Hello</ANSWER>
</TRANSACTION>
<TRANSACTION ID="4315547431">
<ANSWER>This is an answer</ANSWER>
</TRANSACTION>
</SESSION>
<SESSION ID="5678">
<INFO>
<BEGIN LOAD="28"/>
</INFO>
<TRANSACTION ID="4099381642">
<ANSWER>Hello</ANSWER>
</TRANSACTION>
<TRANSACTION ID="1220404184">
<ANSWER>A Different answer</ANSWER>
</TRANSACTION>
<TRANSACTION ID="201506542">
<ANSWER>Yet another one</ANSWER>
</TRANSACTION>
</SESSION>
</MYDATA>
我的代码:
from collections import OrderedDict
# doc contains the xml exactly as loaded by xmltodict
doc = OrderedDict([(u'MYDATA', OrderedDict([(u'SESSION', [OrderedDict([(u'@ID', u'1234'), (u'INFO', OrderedDict([(u'BEGIN', OrderedDict([(u'@LOAD', u'23')]))])), (u'TRANSACTION', [OrderedDict([(u'@ID', u'2103645570'), (u'ANSWER', u'Hello')]), OrderedDict([(u'@ID', u'4315547431'), (u'ANSWER', u'This is an answer')])])]), OrderedDict([(u'@ID', u'5678'), (u'INFO', OrderedDict([(u'BEGIN', OrderedDict([(u'@LOAD', u'28')]))])), (u'TRANSACTION', [OrderedDict([(u'@ID', u'4099381642'), (u'ANSWER', u'Hello')]), OrderedDict([(u'@ID', u'1220404184'), (u'ANSWER', u'A Different answer')]), OrderedDict([(u'@ID', u'201506542'), (u'ANSWER', u'Yet another one')])])])])]))])
sess_ids = [i['@ID'] for i in doc['MYDATA']['SESSION']]
print sess_ids
sess_loads = [i['INFO']['BEGIN']['@LOAD'] for i in doc['MYDATA']['SESSION']]
print sess_loads
trans_ids = [[j['@ID'] for j in i['TRANSACTION']] for i in doc['MYDATA']['SESSION']]
print trans_ids
输出:
sess_ids: [u'1234', u'5678']
sess_loads: [u'23', u'28']
trans_ids: [[u'2103645570', u'4315547431'], [u'4099381642', u'1220404184', u'201506542']]
您可以看到我能够访问 SESSION 元素的 ID 属性以及 BEGIN 元素的 LOAD 属性。
我需要从 TRANSACTION 元素中获取 ID 属性作为单个列表。目前我正在获得一个列表列表 variable trans_ids
。
我怎样才能得到一个简单的值列表?
我努力了:
[j['@ID'] for j in i['TRANSACTION'] for i in doc['MYDATA']['SESSION']]
但这只是重复第二次会议两次,给出:
[u'4099381642',
u'4099381642',
u'1220404184',
u'1220404184',
u'201506542',
u'201506542']