0

我正在尝试解析大小超过 1GB 的 XML 文件,因此我正在使用iterparse但我无法找到第二级子级。从下面的代码中,我可以得到 elem 的孩子,但不能得到 child1 的孩子,即我无法进入 child2 循环

代码:

import xml.etree.cElementTree as ET
xmL = 'F:\\Reports\\Logs\\Result_TG1_V16.xml'

count = 0
flag =0
for event, elem in ET.iterparse(xmL,):
    if event == 'end':
        if elem.tag == 'TasksReportNode':
            count += 1

            for child1 in elem:
                print(child1.tag, child1.text)

                for child2 in child1:
                    print(child2.tag, child2.text)


        elem.clear() # discard the element

print count

XML 示例:完整的 XML 文件 --> XML

<TasksReportNode Name="Task15">
    <TableData NumRows="97" NumColumns="15">
        <TableRow RowCount="0">
            <TableColumn Name="Task"><![CDATA[   Task15 [GET - /PULSEV31/appView/projectFeedHidden.jsp - 200]]]></TableColumn>
            <TableColumn Name="Status"><![CDATA[Success]]></TableColumn>
            <TableColumn Name="Successful"><![CDATA[96]]></TableColumn>
            <TableColumn Name="Failed"><![CDATA[0]]></TableColumn>
            <TableColumn Name="Timedout"><![CDATA[0]]></TableColumn>
            <TableColumn Name="Total"><![CDATA[96]]></TableColumn>
            <TableColumn Name="Min(ms)"><![CDATA[15]]></TableColumn>
            <TableColumn Name="Avg(ms)"><![CDATA[24.20]]></TableColumn>
            <TableColumn Name="Avg-90%(ms)"><![CDATA[54.55]]></TableColumn>
            <TableColumn Name="90%ile(ms)"><![CDATA[89.98]]></TableColumn>
            <TableColumn Name="95%ile(ms)"><![CDATA[95.24]]></TableColumn>
            <TableColumn Name="99%ile(ms)"><![CDATA[99.45]]></TableColumn>
            <TableColumn Name="Max(ms)"><![CDATA[94]]></TableColumn>
            <TableColumn Name="Std. Dev."><![CDATA[15.74]]></TableColumn>
            <TableColumn Name="Bytes Recd(KB)"><![CDATA[192]]></TableColumn>
        </TableRow>
    </TableData>
    <TableData NumRows="1" NumColumns="2">
        <TableRow RowCount="0">
            <TableColumn Name="Response Time Interval (ms)"><![CDATA[0 - 99]]></TableColumn>
            <TableColumn Name="Frequency"><![CDATA[96]]></TableColumn>
        </TableRow>
    </TableData>
</TasksReportNode>
<TasksReportNode Name="Task16">
    <TableData NumRows="97" NumColumns="15">
        <TableRow RowCount="0">
            <TableColumn Name="Task"><![CDATA[   Task16 [GET - /PULSEV31/appView/projectCommentHidden.jsp - 200]]]></TableColumn>
            <TableColumn Name="Status"><![CDATA[Success]]></TableColumn>
            <TableColumn Name="Successful"><![CDATA[96]]></TableColumn>
            <TableColumn Name="Failed"><![CDATA[0]]></TableColumn>
            <TableColumn Name="Timedout"><![CDATA[0]]></TableColumn>
            <TableColumn Name="Total"><![CDATA[96]]></TableColumn>
            <TableColumn Name="Min(ms)"><![CDATA[15]]></TableColumn>
            <TableColumn Name="Avg(ms)"><![CDATA[22.73]]></TableColumn>
            <TableColumn Name="Avg-90%(ms)"><![CDATA[54.55]]></TableColumn>
            <TableColumn Name="90%ile(ms)"><![CDATA[90.93]]></TableColumn>
            <TableColumn Name="95%ile(ms)"><![CDATA[96.25]]></TableColumn>
            <TableColumn Name="99%ile(ms)"><![CDATA[100.50]]></TableColumn>
            <TableColumn Name="Max(ms)"><![CDATA[109]]></TableColumn>
            <TableColumn Name="Std. Dev."><![CDATA[14.76]]></TableColumn>
            <TableColumn Name="Bytes Recd(KB)"><![CDATA[192]]></TableColumn>
        </TableRow>
    </TableData>
</TasksReportNode>
4

1 回答 1

0

这是我尝试过的:我使用了 lxml 而不是 cElementtree

from lxml import etree
xmL = 'F:\\Reports\\Logs\\Result_TG1_V16.xml'
context = etree.iterparse(xmL,  events=("start", "end"),)
for event, element in context:
if element.tag == 'TasksReportNode':
    for child1 in element:
        for child2 in child1:
        if child2.get("RowCount") == "0":
            for child3 in child2:
               print(child3.tag, child3.text)
element.clear() # discard the element
del context

我能够获取所有子标签和数据。

于 2016-03-17T14:04:32.720 回答