2

我正在寻找在这个网站上做一个 POST 请求:

http://web1.ncaa.org/stats/StatsSrv/careersearch

右边的表格有四个下拉菜单。当我运行下面的代码时,顽固地没有选择“学校”。有一个隐藏的输入可能会导致问题,但我无法修复它。页面上的 javascript 似乎没有效果,但我可能是错的。任何帮助表示赞赏:

#!/usr/bin/python

import urllib
import urllib2

url = 'http://web1.ncaa.org/stats/StatsSrv/careersearch'
values = {'searchOrg' : '30123','academicYear' : '2011','searchSport' : 'MBA','searchDiv' : '1'}

data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()

print the_page
4

2 回答 2

2

正如您所怀疑的,您缺少一个隐藏字段:(doWhat = 'teamSearch'用于提交右侧的表单)。

使用这些请求值对我有用:

values = {'doWhat':'teamSearch', 'searchOrg' : '30123','academicYear' : '2011','searchSport' : 'MBA','searchDiv' : '1'}
于 2012-08-14T21:54:52.613 回答
0

我用机械化:

import mechanize
from BeautifulSoup import BeautifulSoup

mech = mechanize.Browser() 
mech.set_handle_robots(False)
response = mech.open('http://web1.ncaa.org/stats/StatsSrv/careersearch')
mech.select_form(nr=2)
mech.form['searchOrg'] = ['30123']
mech.form['academicYear'] = ['2011']
mech.form['searchSport'] = ['MBA']
mech.form['searchDiv'] = ['1']
mech.submit()
soup = BeautifulSoup(mech.response().read())

我知道在机械化网站上要求searchOrgAcademicYearsearchSportsearchDiv以序列/列表形式。您绝对应该注意 robots.txt。

于 2012-08-14T22:02:11.000 回答