我希望通过以下 xml- http://charts.realclearpolitics.com/charts/1044.xml进行解析。我想将结果放在一个包含 3 列的数据框中:日期、批准、不批准。xml 文件是动态的,因为每天都会添加一个新日期,因此代码应该考虑到这一点。我已经实现了一个静态的解决方案,即我必须循环给出值标签行号。我想学习如何动态实现它。
import numpy as np
import pandas as pd
import requests
from pattern import web
xml = requests.get('http://charts.realclearpolitics.com/charts/1044.xml').text
dom = web.Element(xml)
values = dom.by_tag('value')
date = []
approve = []
disapprove = []
values = dom.by_tag('value')
#The last range number below is 1720 instead of 1727 as last 6 values of Approve & Disapprove tag are blank.
for i in range(0,1720):
date.append(pd.to_datetime(values[i].content))
#The last range number below is 3447 instead of 3454 as last 6 values are blank. Including till 3454 will give error while converting to float.
for i in range(1727,3447):
a = float(values[i].content)
approve.append(a)
#The last range number below is 5174 instead of 5181 as last 6 values are blank.
for i in range(3454,5174):
a = float(values[i].content)
disapprove.append(a)
finalresult = pd.DataFrame({'date': date, 'Approve': approve, 'Disapprove': disapprove})
finalresult