1

嗨,我正在为我的学校做一个涉及刮掉 HTML 的项目。

但是,当我寻找表格时,我没有得到任何回报。这是遇到问题的部分。

如果您需要更多信息,我很乐意提供给您

from bs4 import BeautifulSoup
import urllib2
import datetime

#This section determines the date of the next Saturday which will go onto the end of     the URL 
d = datetime.date.today() 
while d.weekday() != 5:
    d += datetime.timedelta(1)

#temporary logic for testing when next webpage isn't out
d = "2013-06-01"

#Section that scrapes the data off the webpage
url = "http://www.sydgram.nsw.edu.au/co-curricular/sport/fixtures/" + str(d) + ".php"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
print soup
#Section that grabs the table with stuff in it
table = soup.find('table', {"class": "excel1"})
print table
4

1 回答 1

0

BeautifulSoup 需要一个 HTML 字符串。你提供的是一个响应对象。

从响应中获取 html:

 html = page.read()

然后将html交给beautifulsoup或直接传递给你喜欢的。

此外 id 建议阅读以下两个链接:

urllib2 文档

BeautifulSoup 文档

于 2013-06-04T12:36:41.163 回答