0

我试图解析一个网络博客页面并将某些数据提取到一个列表中。这是xml..

http://www-01.ibm.com/software/support/lifecycle/rss/PLCWeeklyXMLDownload.xml

有多条记录,但我需要从每条记录中提取软件标题、版本号、版本号、ModLevelNumber 和服务终止日期(如果有)并将它们放入列表中

我正在运行 python 代码,但我是 xml 新手,感谢任何帮助

 def myDownload():
   import xml.etree.ElementTree as et
   import urllib.request
   response = urllib.request.urlopen("http://www-01.ibm.com/software/support/lifecycle/rss/PLCWeeklyXMLDownload.xml")
   tree = et.parse(response)
   root = tree.getroot()
   aList=[]

   for child in root:
      for node in child.findall("SWTitle"):
        title = node.text
        aList.append(title)
      for nodes in child.findall("Versions"):
        for version in nodes.findall("Version"):
          for release in version.findall("Release_Mods"):
            for mod in release.findall("Release_Mod"):
              rNum = mod.find("releaseNumber")
              rNumber = rNum.text
              nNum = mod.find("modLevelNumber")
              nNumber=nNum.text
              aList.append(rNumber)
              aList.append(nNumer)

任何人都可以帮助调整此代码,因为它似乎不起作用

4

2 回答 2

1

使用 lxml 库来解析 xml。ElementTree 不适用于更多嵌套标签。

于 2013-05-02T12:25:42.673 回答
0

您可以为此使用lxml库:

import requests
from lxml import etree

r = requests.get('http://www-01.ibm.com/software/support/lifecycle/rss/PLCWeeklyXMLDownload.xml')
xml = r.content
xml_dom = etree.fromstring(xml)

# Iterate over <SWTitleRecord>
for record_node in xml_dom:
    data = {}
    for attr_node in record_node:
        if attr_node.tag == 'SWTitle'
            data['title'] = attr_node.text
        elif attr_node.tag == 'Versions':
            # parse versions
    ...       
于 2013-04-16T01:58:19.930 回答