python - BS4 中的迭代在网络抓取中失败

Question

我正在使用漂亮的汤 4 从完整的赛狗会议（英国）中抓取数据。这是一个 Url 的示例。 http://www.gbgb.org.uk/resultsMeeting.aspx?id=135549 每次会议通常有 9 到 14 场比赛。下面的代码遍历卡片上的每个比赛（事件）并将数据打印到屏幕上（PyCharm Python v3）。问题是 BS 没有完成迭代并且通常会失败大约在卡片上的第 7 或第 8 场比赛（赛事）左右，在某些情况下，通过仅获取一半跑步者的数据，在赛事（比赛）中途中断。在某些情况下，我收到标准消息“过程已完成退出代码 0“我确实认为这可能与 Url 暂时不可用有关，但程序似乎总是默认在第 7 或 8 场比赛事件左右是不寻常的。我已经搜索了各个页面的源代码和可以看到代码中没有不一致（承认我对 HTML 不太熟悉）任何建议表示赞赏。

 from urllib import urlopen
 from bs4 import BeautifulSoup
 baseURL = 'http://www.gbgb.org.uk/resultsMeeting.aspx?id=135549'
 html = urlopen(baseURL)
 bsObj = BeautifulSoup(html, 'lxml')

 nameList = bsObj.findAll("div", {"class": "resultsBlockHeader"})
 for i in nameList:


     nameList1 = i.findAll("div", {"class": "track"})
     for j in nameList1:
         print(j.get_text())

     nameList1 = i.findAll("div", {"class": "date"})
     for j in nameList1:
         print(j.get_text())

     nameList1 = i.findAll("div", {"class": "datetime"})
     for j in nameList1:
         print(j.get_text())

     nameList1 = i.findAll("div", {"class": "grade"})
     for j in nameList1:
        print(j.get_text())

    nameList1 = i.findAll("div", {"class": "distance"})
    for j in nameList1:
        print(j.get_text())

    nameList1 = i.findAll("div", {"class": "prizes"})
    for j in nameList1:
        print(j.get_text())

nameList = bsObj.findAll("div", {"class": "resultsBlock"})
for i in nameList:

    nameList2 = i.findAll("li", {"class": "trap"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "first essential fin"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "essential greyhound"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "sp"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "timeSec"})
    for j in nameList2:
        print(j.get_text())

    nameList2 = i.findAll("li", {"class": "timeDistance"})
    for j in nameList2:
        print(j.get_text())

python - BS4 中的迭代在网络抓取中失败

0 回答 0

Related

Reference