阅读本教程后,我想出了这段代码,
import requests
from bs4 import BeautifulSoup
import re
import mechanize
import cookielib
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# The site we will navigate into, handling it's session
br.open('http://www.cleanmetrics.net/foodcarbonscope')
br.select_form(nr=0)
br.form['ctl00$ContentPlaceHolder1$userName'] = "XXXXX"
br.form['ctl00$ContentPlaceHolder1$passWord'] = "XXXXXX"
# Login
br.submit()
不断收到此错误:
File "scrapeRecipe.py", line 30, in <module>
br.select_form(nr=0)
File "build/bdist.macosx-10.11-intel/egg/mechanize/_mechanize.py", line 619, in select_form
File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 260, in global_form
File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 267, in forms
File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 282, in _get_forms
File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 247, in root
File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 145, in content_parser
ImportError: No module named html5lib
但是,我知道我已经成功安装了 html5lib,因为当我运行时pip3 freeze
我看到
html5lib==0.999999999
six==1.10.0
webencodings==0.5.1
最新: 我认为我的问题可能与我的 easy-install.pth 文件有关。在我的站点包目录中,我实际上没有看到 html5lib。我只有这个:
BeautifulSoup-3.2.1-py2.7.egg
appdirs-1.4.3.dist-info
appdirs.py
appdirs.pyc
beautifulsoup4-4.5.3.dist-info
bs4
easy-install.pth
html2text-2016.9.19-py2.7.egg
mechanize-0.3.1-py2.7.egg
packaging
packaging-16.8.dist-info
pip-9.0.1-py2.7.egg
requests-2.13.0-py2.7.egg
当我跑的时候easy_install html5lib
,我得到了Adding html5lib 0.999999999 to easy-install.pth file
。但是,在它成功完成对 html5lib 的依赖项处理后,我打开了我的 easy_install.pth 文件,却没有看到任何地方提到的 html5lib?
import sys; sys.__plen = len(sys.path)
./BeautifulSoup-3.2.1-py2.7.egg
./html2text-2016.9.19-py2.7.egg
./mechanize-0.3.1-py2.7.egg
./requests-2.13.0-py2.7.egg
./pip-9.0.1-py2.7.egg
import sys; new=sys.path[sys.__plen:]; del sys.path[sys.__plen:]; p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; sys.__egginsert = p+l en(new)
除非 html5lib 位于上述软件包之一中?我想知道是否需要在我的 python 代码中导入 html5lib 并列出根路径?
真的不知道为什么这会被否决?:/