我编写了一个脚本来解析来自某些网站的 html 代码以提取特定数据。我有两个不同的站点可以从中提取这些数据,因此我使用了 elif 语句。这是代码:
import urllib
class city :
def __init__(self, city_name, link) :
self.name = city_name
self.url = link
self.high0 = 0
self.high1 = 0
self.high2 = 0
self.high3 = 0
self.high4 = 0
self.high5 = 0
self.high6 = 0
self.low0 = 0
self.low1 = 0
self.low2 = 0
self.low3 = 0
self.low4 = 0
self.low5 = 0
def retrieveTemps(self) :
filehandle = urllib.urlopen(self.url)
# get lines from result into array
lines = filehandle.readlines()
# (for each) loop through each line in lines
line_number = 0 # a counter for line number
for line in lines:
line_number = line_number + 1 # increment counter
# find string, position otherwise position is -1
position1 = line.rfind('#f2')
if position1 > 0 :
self.high0 = lines[line_number].split('&')[0].split('>')[1] # next line: high
self.low0 = lines[line_number + 10].split('&')[0].split('>')[1] # next line:low
elif position1 < 0 :
position1 = line.rfind('>Overnight')
if position1 > 0 :
self.high0 = lines[line_number + 9].split('&')[0].split(':')[1] # next line: high
self.low0 = lines[line_number + 15].split('&')[0].split(':')[1] # next line:low
当 position1 = line.rfind('#f2') 时,该脚本可以完美运行。但是,当它找不到“#f2”(这仅位于第一个站点的 html 代码中,而不是第二个站点的 html 代码中)时,我试图告诉它查找“>过夜”,然后提取之间的数据':' 和 '&'。“数据”将始终是一个数字。我在想一个问题可能是我试图提取的这个数字的两边都有一个空格,但我不知道如何解决这个问题。当我运行脚本时,我收到错误:
self.high0 = lines[line_number + 9].split('&')[0].split(':')[1] # next line: high "IndexError: list index out of range"
作为参考,这是我为第一个网站解析的 html 代码:
</h3><img src="/weathericons/15.gif" longdesc="#f2" alt="Rain mixed with snow" title="Rain mixed with snow" /><ul>
<li class="high" title="High">3°C</li>
<li class="low"> </li>
<li class="pop"> </li>
</ul>
</div>
并从第二个网站(我收到错误的那个网站):
<p class="txt-ctr-caps">Overnight<br><br></p>
<p><img src="/images/wtf/medium/nra60.png" width="86" height="86" alt="Rain Likely Chance for Measurable Precipitation 60%" title="Rain Likely Chance for Measurable Precipitation 60%" /></p>
<p>Rain<br>Likely<br></p>
<p class="point-forecast-icons-low">Low: 3 °C</p>
</div>
<div class="one-ninth-first">
<p class="txt-ctr-caps">Thursday<br><br></p>
<p><img src="/images/wtf/medium/ra70.png" width="86" height="86" alt="Rain Likely Chance for Measurable Precipitation 70%" title="Rain Likely Chance for Measurable Precipitation 70%" /></p>
<p>Rain<br>Likely<br></p>
<p class="point-forecast-icons-high">High: 9 °C</p>
</div>
任何帮助将不胜感激,谢谢!