我正在尝试从网站上抓取一些数据。我正在尝试编写的脚本应该获取页面的内容:
http://www.atpworldtour.com/Rankings/Singles.aspx
应该模拟用户通过附加排名和日期的每个选项并模拟点击 Go 然后在获取数据后应该使用返回功能。
目前,我一直在尝试为附加声望选择此选项:
<option value="101" >101-200</option>
这是我尝试这样做的(糟糕的)尝试:
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
import re
import urllib2
br = Browser();
br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
br.select_form(nr=0);
br["r"] = "101";
response = br.submit();
然而,它只是在应该选择第一个表单的 select_form(nr=0) 上失败。
这是 Python 返回的日志:
>>> from mechanize import Browser
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> import urllib2
>>>
>>>
>>>
>>> br = Browser();
>>> br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
<response_seek_wrapper at 0x311bb48L whose wrapped object = <closeable_response
at 0x311be88L whose fp = <socket._fileobject object at 0x0000000002C94408>>>
>>> br.select_form(nr=0);
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 505, in select_
form
File "build\bdist.win-amd64\egg\mechanize\_html.py", line 546, in __getattr__
File "build\bdist.win-amd64\egg\mechanize\_html.py", line 559, in forms
File "build\bdist.win-amd64\egg\mechanize\_html.py", line 228, in forms
mechanize._html.ParseError
我无法在 mechanize 主页中找到所有功能的正确解释。谁能指出我使用表格和机械化的正确教程或在这个特定问题上帮助我?
安东尼