我的代码有两个问题。首先,数据没有在字段标题下正确显示,其次,循环仅从 html 中获取部分数据。代码尝试提取 14 个事件,这些事件都在一个页面上网站。页面中每个事件的 HTML 代码都是相同的,(即 html 只是一遍又一遍地重复)。第一个问题在于结果数据和字段标题。我应该得到这个: Fin,Greyhound,Trap, SP,时间/秒,时间,距离,教练,评论
1,Bernies Toughguy,3,7/4F,3.63,23.91,(培训师: MN Fenwick),"评论: EP,SnLd
2,Gentle Kewell,2,7/2,3.70,24.01 (1 1/4),(教练: JM Liles),评论: MidToRls,RanOn
3,Tintreach Harry,5,3/1,3.72,24.17 (2),(Trainer: ACB Green),"Comment: BmpRnUp&2,Crd 1/4"
4,Colorado Teegan,4,7/1,3.74,24.33 (2),(培训师: MN Fenwick),"评论: Wide,EvCh"
5,Premarket Honey,6,6/1,3.68,24.51 (2 1/4),(Trainer:ACB Green),”Comment: SAw,Crd2”
6,Malbay Roxy,1,7/2,3.81,24.57 (3/4),(培训师: MN Fenwick),"评论: EP,SnLd"
在这里,每条数据都正确落在每个字段(粗体)标题下,即 Finishing Position Dogname 等。但是当我运行程序时,我得到了这个:
Fin,Greyhound,Trap,SP,Time/Sec.,Time/Distance, (Trainer: MN Fenwick),"Comment: EP,SnLd"
1,Bernies Toughguy,3,7/4F,3.63,23.91,(培训师:JM Liles),“评论:MidToRls,RanOn”
2,Gentle Kewell,2,7/2,3.70,24.01 (1 1/4),(训练师: ACB Green),评论: "BmpRnUp& 1/4"
3,Tintreach Harry,5,3/1,3.72,24.17 (2),(Trainer: ACB Green),"评论: "BmpRnUp&2,Crd 1/4"
4,Colorado Teegan,4,7/1,3.74,24.33 (2),(培训师: MN Fenwick),"评论: Wide,EvCh"
5,Premarket Honey,6,6/1,3.68,24.51 (2 1/4),(培训师: JM Liles),"评论: SAw,Crd2"
6,Malbay Roxy,1,7/2,3.81,24.57 (3/4),(训练师: BD O'sullivan),"评语: EP,SnLd"
请注意,在应该包含字段名称的第一行中,我得到了一些字段名称,但最后几个被替换为培训师的姓名和评论,(斜体)这具有弄乱其余部分的效果各个领域的数据。
第二个问题可能与循环迭代有关。正如我已经说过的,页面上的 HTML 非常统一,但是由于某种原因,当我运行程序时,数据在第 5 个参与者(Avenue Bound)处停止,在第 6 个事件中(11.51)在卡片上,当卡片上实际上有 14 个事件时,循环失败了其余的事件。所以循环似乎正在崩溃,但我在 HTML 中看不到任何明显的原因。下面是代码我已经尝试了许多代码变体,但似乎无法破解它。我确实认为我可能必须包含代码来确定循环中的迭代次数,但是 python 循环与 C 循环不同,并且对此我不熟悉找到任何东西。非常感谢任何帮助。
import csv
from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen ("http://www.gbgb.org.uk/resultsMeeting.aspx?id=132115")
bsObj = BeautifulSoup(html)
one = bsObj.findAll("li", {"class": "first essential fin"})
two = bsObj.findAll("li", {"class": "essential greyhound"})
three = bsObj.findAll("li", {"class": "trap"})
four = bsObj.findAll("li", {"class": "sp"})
five = bsObj.findAll("li", {"class": "timeSec"})
six = bsObj.findAll("li", {"class": "timeDistance"})
seven = bsObj.findAll("li", {"class": "essential trainer"})
eight = bsObj.findAll("li", {"class": "first essential comment"})
firstessentialfin = [a.getText().strip() for a in one]
essentialgreyhound = [b.getText().strip() for b in two]
trap = [c.getText().strip() for c in three]
sp = [d.getText().strip() for d in four]
timeSec = [e.getText().strip() for e in five]
timeDistance = [f.getText().strip() for f in six]
essentialtrainer = [g.getText().strip() for g in seven]
firstessentialcomment = [h.getText().strip() for h in eight]
with open('dogfile.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=",")
for c in zip(firstessentialfin,essentialgreyhound,trap,sp,timeSec,timeDistance,esssentialtrainer, firstessentialcomment):
writer.writerow(c)