我的 XML 文件如下所示:
<root>
<group from="1", to="100">
<link target="1"/>
...
<link target="100"/>
</group>
...
</root>
我有 6000 个<group>
元素和 5M个<link>
元素。我想要一个以元组 ( from
, to
) 作为键和<link>
s'target
属性列表的字典,但是使用以下代码出现内存错误:
from lxml import etree
from gzip import open as gopen
def extractTargets(fin):
targets = dict()
with gopen(fin) as xml:
context = etree.iterparse(xml, tag="group")
for event, elem in context:
targets[(elem.get("from"), elem.get("to"))] = elem.xpath("link/@target")
elem.clear()
while elem.getprevious() is not None:
del elem.getparent()[0]
del context