python - 将 XML 数据组织到字典中

Question

我正在尝试将我的数据从 XML 数据组织成字典格式。这将用于运行蒙特卡罗模拟。

下面是 XML 中的几个条目的示例：

<retirement>
    <item>
        <low>-0.34</low>
        <high>-0.32</high>
        <freq>0.0294117647058824</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
    <item>
        <low>-0.32</low>
        <high>-0.29</high>
        <freq>0</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
</retirement>

我当前的数据集只有两个变量，类型可以是 3 中的 1 或可能的 4 离散类型。对两个变量进行硬编码不是问题，但我想开始处理具有更多变量的数据并自动执行此过程。我的目标是自动将此 XML 数据导入字典，以便以后能够进一步操作它，而无需在数组标题和变量中进行硬编码。

这是我所拥有的：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse('xmlfile')

# Create iterable item list
Items = tree.findall('item')

# Create Master Dictionary
masterDictionary = {}

# Assign variables to dictionary
for Item in Items:
    thisKey = Item.find('variable').text
    if thisKey in masterDictionary == False:
        masterDictionary[thisKey] = []
    else:
        pass

thisList = masterDictionary[thisKey]
newDataPoint = DataPoint(float(Item.find('low').text), float(Item.find('high').text), float(Item.find('freq').text))
thisSublist.append(newDataPoint)

我得到一个 KeyError @thisList = masterDictionary[thisKey]

我也在尝试创建一个类来处理 xml 的其他一些元素：

# Define a class for each data point that contains low, hi and freq attributes
class DataPoint:
 def __init__(self, low, high, freq):
  self.low = low
  self.high = high
  self.freq = freq

然后我可以使用以下内容检查值吗：

masterDictionary['stock'] [0].freq

任何和所有的帮助表示赞赏

更新

感谢约翰的帮助。缩进问题是我的草率。这是我第一次在 Stack 上发帖，只是复制/粘贴不正确。else: 之后的部分实际上缩进为 for 循环的一部分，并且该类在我的代码中缩进了四个空格——这只是一个糟糕的帖子。我会牢记大写约定。您的建议确实有效，现在使用以下命令：

print masterDictionary.keys()
print masterDictionary['stock'][0].low

产量：

['inflation', 'stock']
-0.34

这些确实是我的两个变量，并且值与顶部列出的 xml 同步。

更新 2

好吧，我以为我已经解决了这个问题，但我又粗心了，事实证明我还没有完全解决这个问题。之前的解决方案最终将所有数据写入我的两个字典键，以便我有两个相等的所有数据列表，分配给两个不同的字典键。这个想法是将不同的数据集从 XML 分配给匹配的字典键。这是当前代码：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}
thisList = []

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text 
    masterDictionary[thisKey] = thisList
    if thisKey not in masterDictionary:
        masterDictionary[thisKey] = []
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    thisList.append(newDataPoint)

当我输入：

print masterDictionary['stock'][5].low
print masterDictionary['inflation'][5].low
print len(masterDictionary['stock'])
print len(masterDictionary['inflation'])

两个键（'stock' 和 'inflation'）的结果是相同的：

-.22
-.22
56
56

XML 文件中有 27 件物品带有股票标签，29 件物品带有通货膨胀标签。如何使分配给字典键的每个列表仅在循环中提取特定数据？

更新 3

它似乎适用于 2 个循环，但我不知道它如何以及为什么它不能在 1 个单循环中工作。我不小心管理了这个：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text
    thisList = []
    masterDictionary[thisKey] = thisList

for item in items:
    thisKey = item.find('variable').text
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    masterDictionary[thisKey].append(newDataPoint)

我已经尝试了大量的排列以使其在一个循环中发生，但没有运气。我可以将所有数据列在两个键中——所有数据的相同数组（不是很有帮助），或者将数据正确排序到两个键的 2 个不同数组中，但只有最后一个数据条目（循环覆盖自身每次只在数组中留下一个条目）。

score 2 · Accepted Answer

在 (unnecessary) 之后你有一个严重的缩进问题else: pass。修复它，然后重试。您的示例输入数据是否出现问题？其他数据？第一次绕圈？导致问题的值是什么thisKey[提示：它在 KeyError 错误消息中报告]？在错误发生之前 masterDictionary 的内容是什么[提示：print在你的代码周围散布一些语句]？

与您的问题无关的其他备注：

而不是if thisKey in masterDictionary == False:考虑使用if thisKey not in masterDictionary:... 比较TrueorFalse几乎总是多余的和/或有点“代码味道”。

Python 约定是为类保留具有首字母大写字母（如Item）的名称。

每个缩进级别仅使用一个空格会使代码几乎难以辨认，并且已被严重弃用。始终使用 4（除非您有充分的理由——但我从未听说过）。

更新我错了：thisKey in masterDictionary == False比我想象的还要糟糕；因为in是一个关系运算符，所以使用链式求值（如a <= b < c），所以你有(thisKey in masterDictionary) and (masterDictionary == False)它总是会计算为 False，因此字典永远不会更新。正如我所建议的那样修复：使用if thisKey not in masterDictionary:

看起来thisList（已初始化但未使用）应该是thisSublist（已使用但未初始化）。

score 0 · Accepted Answer

改变：

if thisKey in masterDictionary == False:

到

if thisKey not in masterDictionary:

这似乎就是您收到该错误的原因。此外，在尝试追加之前，您需要为“thisSublist”分配一些内容。尝试：

thisSublist = []
thisSublist.append(newDataPoint)

score -1 · Accepted Answer

您在 for 循环内的 if 语句中有错误。代替

if thisKey in masterDictionary == False:

写

if (thisKey in masterDictionary) == False:

鉴于您的原始代码的其余部分，您将能够像这样访问数据：

masterDictionary['stock'][0].freq

John Machin 提出了一些关于风格和气味的有效观点，（你应该考虑他建议的改变），但这些事情会随着时间和经验而来。

python - 将 XML 数据组织到字典中

3 回答 3

Related

Reference