0

我正在尝试使用 Python 解析包含重复子元素的 XML 文档。当我尝试解析数据时,它会创建一个空文件。如果我注释掉重复的子元素代码(请参阅下面的 python 脚本中的粗体部分),则文档会正确生成。有人可以帮忙吗?

XML:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<FRPerformance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <FRPerformanceShareClassCurrency>
    <FundCode>00190</FundCode>
    <CurrencyID>USD</CurrencyID>
    <FundShareClassCode>A</FundShareClassCode>
    <ReportPeriodFrequency>Quarterly</ReportPeriodFrequency>
    <ReportPeriodEndDate>06/30/2012</ReportPeriodEndDate>
    <Net>
      <Annualized>
        <Year1>-4.909000000</Year1>
        <Year3>10.140000000</Year3>
        <Year5>-22.250000000</Year5>
        <Year10>-7.570000000</Year10>
        <Year15>-4.730000000</Year15>
        <Year20>-0.900000000</Year20>
        <SI>1.900000000</SI>
      </Annualized>
    </Net>
    <Gross>
      <Annualized>
        <Month3>1.279000000</Month3>
        <YTD>7.294000000</YTD>
        <Year1>-0.167000000</Year1>
        <Year3>11.940000000</Year3>
        <Year5>-21.490000000</Year5>
        <Year10>-7.120000000</Year10>
        <Year15>-4.420000000</Year15>
        <Year20>-0.660000000</Year20>
        <SI>2.110000000</SI>
      </Annualized>
      <Cumulative>
        <Month1Back>2.288000000</Month1Back>
        <Month2Back>-1.587000000</Month2Back>
        <Month3Back>0.610000000</Month3Back>
        <CurrentYear>7.294000000</CurrentYear>
        <Year1Back>-2.409000000</Year1Back>
        <Year2Back>13.804000000</Year2Back>
        <Year3Back>20.287000000</Year3Back>
        <Year4Back>-78.528000000</Year4Back>
        <Year5Back>-0.101000000</Year5Back>
        <Year6Back>9.193000000</Year6Back>
        <Year7Back>2.659000000</Year7Back>
        <Year8Back>9.208000000</Year8Back>
        <Year9Back>25.916000000</Year9Back>
        <Year10Back>-3.612000000</Year10Back>
      </Cumulative>
      <HistoricReturns>
        <HistoricReturns_Item>
          <Date>Fri, 28 Feb 1997 00:00:00 -0600</Date>
          <Return>32058.090000000</Return>
        </HistoricReturns_Item>
        <HistoricReturns_Item>
          <Date>Fri, 28 Feb 2003 00:00:00 -0600</Date>
          <Return>36415.110000000</Return>
        </HistoricReturns_Item>
        <HistoricReturns_Item>
          <Date>Fri, 29 Feb 2008 00:00:00 -0600</Date>
          <Return>49529.290000000</Return>
        </HistoricReturns_Item>
        <HistoricReturns_Item>
          <Date>Fri, 30 Apr 1993 00:00:00 -0600</Date>
          <Return>21621.500000000</Return>
        </HistoricReturns_Item>
</<HistoricReturns>

Python 脚本

## Create command line arguments for XML file and tageName
xmlFile = sys.argv[1]
tagName = sys.argv[2]


tree = ET.parse(xmlFile)
root = tree.getroot()

## Setup the file for output
saveout = sys.stdout
output_file =  open('parsedXML.csv', 'w')
sys.stdout = output_file

## Parse XML

for node in root.findall(tagName):
    fundCode = node.find('FundCode').text
    curr = node.find('CurrencyID').text
    shareClass = node.find('FundShareClassCode').text
    for node2 in node.findall('./Net/Annualized'):
        year1 = node2.findtext('Year1')
        year3 = node2.findtext('Year3')
        year5 = node2.findtext('Year5')
        year10 = node2.findtext('Year10')
        year15 = node2.findtext('Year15')
        year20 = node2.findtext('Year20')
        SI = node2.findtext('SI')
        for node3 in node.findall('./Gross'):
            for node4 in node3.findall('./Annualized'):
                month3 = node4.findtext('Month3')
                ytd = node4.findtext('YTD')
                year1g = node4.findtext('Year1')
                year3g = node4.findtext('Year3')
                year5g = node4.findtext('Year5')
                year10g = node4.findtext('Year10')
                year15g = node4.findtext('Year15')
                year20g = node4.findtext('Year2')
                SIg = node4.findtext('SI')
            for node5 in node3.findall('./Cumulative'):
                month1b = node5.findtext('Month1Back')
                month2b = node5.findtext('Month2Back')
                month3b = node5.findtext('Month3Back')
                curYear = node5.findtext('CurrentYear')
                year1b = node5.findtext('Year1Back')
                year2b = node5.findtext('Year2Back')
                year3b = node5.findtext('Year3Back')
                year4b = node5.findtext('Year4Back')
                year5b = node5.findtext('Year5Back')
                year6b = node5.findtext('Year6Back')
                year7b = node5.findtext('Year7Back')
                year8b = node5.findtext('Year8Back')
                year9b = node5.findtext('Year9Back')
                year10b = node5.findtext('Year10Back')
        **for node6 in node.findall('./HistoricReturns'):
            for node7 in node6.findall('./HistoricReturns_Item'):
                hDate = node7.findall('Date')
                hReturn = node7.findall('Return')**
                print(fundCode, curr, shareClass,year1, year3, year5, year10, year15, year15, year20, SI,month3, ytd, year1g, year3g, year5g, year10g, year15g, year20g, SIg, month1b, month2b, month3b, curYear, year1b, year2b, year3b, year4b, year5b, year6b, year7b, year8b,year9b,year10b, hDate, hReturn)
4

1 回答 1

1

示例 XML 和 python 代码在结构上不匹配。任何一个

  • 您缺少</Gross>XML 中的结束标记(应该在该<HistoricReturns>部分开始之前) - 在这种情况下,代码是正确的或
  • 代码应该是for node6 in node3.findall('./HistoricReturns'):ienode3而不是node

注意 XML 示例不完整(它不是格式良好的 XML),因为它缺少 的结束标记GrossFRPerformanceShareClassCurrency因此FRPerformance无法明确回答问题。希望这会有所帮助。

于 2013-01-03T20:52:00.673 回答