1

I tried to create a small autoposter script. I need to find and print all input from a webpage. I'm trying to use the mechanize library.

I wrote this script:

import urllib  
import cookielib  
import mechanize  

url = "https://www.sito.com/page.html"  

cookie = cookielib.CookieJar()  
browser = mechanize.Browser()  

browser.set_cookiejar(cookie)  
browser.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)  

browser.open(url)  

for f  in browser.forms():  
    print f.name

How can I print all input from web page with mechanize or possibly another library?

4

1 回答 1

0

你为什么不直接使用urllib2+ BeautifulSoup

import urllib2
from bs4 import BeautifulSoup

url = "http://sito.com/SitoContact.htm"  # change to whatever your url is

page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)

for i in soup.find_all('input'):
    print i

仅供参考,由于 ssl 错误,我无法访问您提供的页面,这就是该示例使用另一个 URL 的原因。

请注意,如果您需要填写表格或使用输入进行一些操作,您将需要mechanize或类似的工具。但是,无论如何,您可以继续BeautifulSoup用于解析 html。另外,看看Selenium项目。

于 2013-09-05T20:49:45.227 回答