1

我正在尝试从 XML 中提取一些数据。我正在使用xmltodict将数据加载到字典中,然后使用列表推导将各个部分提取到单独的列表中。稍后我将使用 matplotlib 绘制这些图。

XML:

<?xml version="1.0" ?>
<MYDATA>
<SESSION ID="1234">
    <INFO>
        <BEGIN LOAD="23"/>
    </INFO>
    <TRANSACTION ID="2103645570">
        <ANSWER>Hello</ANSWER>
    </TRANSACTION>
    <TRANSACTION ID="4315547431">
        <ANSWER>This is an answer</ANSWER>
    </TRANSACTION>
</SESSION>
<SESSION ID="5678">
    <INFO>
        <BEGIN LOAD="28"/>
    </INFO>
    <TRANSACTION ID="4099381642">
        <ANSWER>Hello</ANSWER>
    </TRANSACTION>
    <TRANSACTION ID="1220404184">
        <ANSWER>A Different answer</ANSWER>
    </TRANSACTION>
    <TRANSACTION ID="201506542">
        <ANSWER>Yet another one</ANSWER>
    </TRANSACTION>
</SESSION>
</MYDATA>

我的代码:

from collections import OrderedDict

# doc contains the xml exactly as loaded by xmltodict
doc = OrderedDict([(u'MYDATA', OrderedDict([(u'SESSION', [OrderedDict([(u'@ID', u'1234'), (u'INFO', OrderedDict([(u'BEGIN', OrderedDict([(u'@LOAD', u'23')]))])), (u'TRANSACTION', [OrderedDict([(u'@ID', u'2103645570'), (u'ANSWER', u'Hello')]), OrderedDict([(u'@ID', u'4315547431'), (u'ANSWER', u'This is an answer')])])]), OrderedDict([(u'@ID', u'5678'), (u'INFO', OrderedDict([(u'BEGIN', OrderedDict([(u'@LOAD', u'28')]))])), (u'TRANSACTION', [OrderedDict([(u'@ID', u'4099381642'), (u'ANSWER', u'Hello')]), OrderedDict([(u'@ID', u'1220404184'), (u'ANSWER', u'A Different answer')]), OrderedDict([(u'@ID', u'201506542'), (u'ANSWER', u'Yet another one')])])])])]))])

sess_ids = [i['@ID'] for i in doc['MYDATA']['SESSION']]
print sess_ids

sess_loads = [i['INFO']['BEGIN']['@LOAD'] for i in doc['MYDATA']['SESSION']]
print sess_loads

trans_ids = [[j['@ID'] for j in i['TRANSACTION']] for i in doc['MYDATA']['SESSION']]
print trans_ids

输出:

sess_ids:    [u'1234', u'5678']
sess_loads:  [u'23', u'28']
trans_ids:   [[u'2103645570', u'4315547431'], [u'4099381642', u'1220404184', u'201506542']]

您可以看到我能够访问 SESSION 元素的 ID 属性以及 BEGIN 元素的 LOAD 属性。

我需要从 TRANSACTION 元素中获取 ID 属性作为单个列表。目前我正在获得一个列表列表 variable trans_ids

我怎样才能得到一个简单的值列表?

我努力了:

[j['@ID'] for j in i['TRANSACTION'] for i in doc['MYDATA']['SESSION']]

但这只是重复第二次会议两次,给出:

[u'4099381642',
 u'4099381642',
 u'1220404184',
 u'1220404184',
 u'201506542',
 u'201506542']
4

3 回答 3

2

你有什么理由需要去查字典吗?这种事情在 XML 中相当简单:

import xml.etree.ElementTree as etree
txml = etree.parse('xml string above')
txml.findall('SESSION/TRANSACTION')
[<Element TRANSACTION at 0x4064f9d8>,
 <Element TRANSACTION at 0x4064fa20>,
 <Element TRANSACTION at 0x4064f990>,
 <Element TRANSACTION at 0x4064fa68>,
 <Element TRANSACTION at 0x4064fab0>]
[x.get('ID') for x in txml.findall('SESSION/TRANSACTION')]
['2103645570', '4315547431', '4099381642', '1220404184', '201506542']

至少,它对我来说似乎更紧凑。

于 2013-09-30T18:44:56.023 回答
1

我努力了:

[j['@ID'] for j in i['TRANSACTION'] for i in doc['MYDATA']['SESSION']]

你几乎拥有它。只需反转内部for..in部分:

>>> [j['@ID'] for i in doc['MYDATA']['SESSION'] for j in i['TRANSACTION']]
[u'2103645570', u'4315547431', u'4099381642', u'1220404184', u'201506542']

要理解这一点,请看一下这个例子:

>>> a = [[1, 2, 3], [4, 5, 6]]
>>> [j for j in i for i in a]
[4, 4, 5, 5, 6, 6]
>>> [j for i in a for j in i]
[1, 2, 3, 4, 5, 6]

当列表推导中有多个for..in部分时,它们从左到右进行评估。因此,如果您的外观是这样的:

for i in a:
    for j in i
        j

然后你必须以相同的顺序指定它,而不是从内到外:

[j for i in a for j in i]
于 2013-09-30T16:58:02.627 回答
0
from itertools import chain
list(chain(*trans_ids))
于 2013-09-30T16:49:23.373 回答