我正在尝试复制本教程中的示例,但将 iterparse 与 elem.clear() 一起使用。
XML 示例:
<?xml version="1.0" encoding="UTF-8"?>
<scenario>
<world>
<region name="USA">
<AgSupplySector name="Corn" nocreate="1">
<AgSupplySubsector name="Corn_NelsonR" nocreate="1">
<AgProductionTechnology name="Corn_NelsonR" nocreate="1">
<period year="1975">
<Non-CO2 name="SO2_1_AWB">
<input-emissions>3.98749e-05</input-emissions>
<output-driver/>
<gdp-control name="GDP_control">
<max-reduction>60</max-reduction>
<steepness>3.5</steepness>
</gdp-control>
</Non-CO2>
<Non-CO2 name="NOx_AWB">
<input-emissions>0.000285263</input-emissions>
<output-driver/>
<gdp-control name="GDP_control">
<max-reduction>60</max-reduction>
<steepness>3.5</steepness>
</gdp-control>
</Non-CO2>
</period>
</AgProductionTechnology>
</AgSupplySubsector>
</AgSupplySector>
</region>
</world>
</scenario>
import os
import xml.etree.cElementTree as etree
import codecs
import csv
PATH = 'D:\Book1'
FILENAME_BIO = 'Test.csv'
FILENAME_XML = 'all_aglu_emissions.xml'
ENCODING = "utf-8"
pathBIO = os.path.join(PATH, FILENAME_BIO)
pathXML = os.path.join(PATH, FILENAME_XML)
with codecs.open(pathBIO, "w", ENCODING) as bioFH:
bioWriter = csv.writer(bioFH, quoting=csv.QUOTE_MINIMAL)
bioWriter.writerow(['Year','Gas', 'Value','Technology','Crop','Country'])
for event, elem in etree.iterparse(pathXML, events=('start','end')):
if event == 'start' and elem.tag == 'region':
str1 = elem.attrib['name']
elif event == 'start' and elem.tag == 'AgSupplySector':
str2 = elem.attrib['name']
elif event == 'start' and elem.tag == 'AgProductionTechnology':
str3 = elem.attrib['name']
elif event == 'start' and elem.tag == 'period':
str4 = elem.attrib['year']
elif event == 'start' and elem.tag == 'Non-CO2':
str5 = elem.attrib['name']
elif event == 'end' and elem.tag == 'input-emissions':
for em in elem.iter('input-emissions'):
str6 = em.text
bioWriter.writerow([str4, str5, str6, str3, str2, str1])
elem.clear()
我的问题是我得到了更多额外的行,其中 str6 为空字段。可能,我在这里有嵌套问题。请帮忙。错误示例(出现 0 个字段):