python - 使用嵌套循环使用 beautifulsoup 从 HTML 中获取所有表格

Question

我正在尝试使用嵌套循环从该站点获取所有表。我快到了，但仍然不确定几个具有相同类标识符的表的循环。我得到一个错误代码line 26 : for s in soup.findALL ("table", { "class" : "boxScore"})

SyntaxError：无效的语法。

我的脚本：

import datetime
import urllib
from bs4 import BeautifulSoup
import urllib2


day = int(datetime.datetime.now().strftime("%d"))-1

month = datetime.datetime.now().strftime("%B")
year = datetime.datetime.now().strftime("%Y")
file_name = "/users/ripple/NHL.csv"
file = open(file_name,"w")
url = "http://www.tsn.ca/nhl/scores/?date=" + month + "/" + str(day) + "/" + year
print 'Grabbing from: ' + url + '...\n'
try:
        r = urllib2.urlopen(url)
except urllib2.URLError as e:
           r = e
if r.code in (200, 401):    
    #get the table data from the page
    data = urllib.urlopen(url).read()
    #send to beautiful soup
    soup = BeautifulSoup(data)
    print soup
    soup = soup.findALL ("table", { "class" : "boxScore"})
    for s in soup.findALL ("table", { "class" : "boxScore"})
        table = soup.find("table",{ "class" : "boxScore"})
        for tr in table.findAll('tr')[2:]:
            col = tr.findAll('td')
            team = col[0].get_text().encode('ascii','ignore').replace(" ","")
            firstp = col[1].get_text().encode('ascii','ignore').replace(" ","")
            secondp = col[2].get_text().encode('ascii','ignore').replace(" ","")
            thirdp = col[3].get_text().encode('ascii','ignore').replace(" ","")
            final = col[4].get_text().encode('ascii','ignore').replace(" ","")
            record = team + ',' + final + '\n'
            print record
            file.write(record)
else: 
    print str(i) + " NO GAMES"
file.close()

score 2 · Accepted Answer

Python 中的 for 循环以冒号 ':' 结尾。

另外：API 方法是 findAll() 而不是 findALL()。

python - 使用嵌套循环使用 beautifulsoup 从 HTML 中获取所有表格

1 回答 1

Related

Reference