您需要发布页面给出的值(EVENTVALIDATION、VIEWSTATE 等)。
这段代码可以工作(注意它使用了很棒的Requests库而不是 Mechanize)
import lxml.html
import requests
starturl = 'http://bbsmates.com/browsebbs.aspx?BBSName=&AreaCode=314'
s = requests.session() # create a session object
r1 = s.get(starturl) #get page 1
html = r1.text
root = lxml.html.fromstring(html)
#pick up the javascript values
EVENTVALIDATION = root.xpath('//input[@name="__EVENTVALIDATION"]')[0].attrib['value']
#find the __EVENTVALIDATION value
VIEWSTATE = root.xpath('//input[@name="__VIEWSTATE"]')[0].attrib['value']
#find the __VIEWSTATE value
# build a dictionary to post to the site with the values we have collected. The __EVENTARGUMENT can be changed to fetch another result page (3,4,5 etc.)
payload = {'__EVENTTARGET': 'ctl00$ContentPlaceHolder1$GridView1','__EVENTARGUMENT':'Page$25','__EVENTVALIDATION':EVENTVALIDATION,'__VIEWSTATE':VIEWSTATE,'__VIEWSTATEENCRYPTED':'','ctl00$txtUsername':'','ctl00$txtPassword':'','ctl00$ContentPlaceHolder1$txtBBSName':'','ctl00$ContentPlaceHolder1$txtSysop':'','ctl00$ContentPlaceHolder1$txtSoftware':'','ctl00$ContentPlaceHolder1$txtCity':'','ctl00$ContentPlaceHolder1$txtState':'','ctl00$ContentPlaceHolder1$txtCountry':'','ctl00$ContentPlaceHolder1$txtZipCode':'','ctl00$ContentPlaceHolder1$txtAreaCode':'314','ctl00$ContentPlaceHolder1$txtPrefix':'','ctl00$ContentPlaceHolder1$txtDescription':'','ctl00$ContentPlaceHolder1$Activity':'rdoBoth','ctl00$ContentPlaceHolder1$drpRPP':'20'}
# post it
r2 = s.post(starturl, data=payload)
# our response is now page 2
print r2.text
当您到达结果的末尾(结果第 21 页)时,您必须再次获取 VIEWSTATE 和 EVENTVALIDATION 值(并且每 20 页执行一次)。
请注意,您发布的一些值是空的,还有一些包含值。完整列表是这样的:
'ctl00$txtUsername':'','ctl00$txtPassword':'','ctl00$ContentPlaceHolder1$txtBBSName':'','ctl00$ContentPlaceHolder1$txtSysop':'','ctl00$ContentPlaceHolder1$txtSoftware':'','ctl00$ContentPlaceHolder1$txtCity':'','ctl00$ContentPlaceHolder1$txtState':'','ctl00$ContentPlaceHolder1$txtCountry':'','ctl00$ContentPlaceHolder1$txtZipCode':'','ctl00$ContentPlaceHolder1$txtAreaCode':'314','ctl00$ContentPlaceHolder1$txtPrefix':'','ctl00$ContentPlaceHolder1$txtDescription':'','ctl00$ContentPlaceHolder1$Activity':'rdoBoth','ctl00$ContentPlaceHolder1$drpRPP':'20'
以下是 Scraperwiki 邮件列表上关于类似问题的讨论:https ://groups.google.com/forum/#!topic/scraperwiki/W0Xi7AxfZp0