-1

阅读本教程后,我想出了这段代码,

import requests
   from bs4 import BeautifulSoup
   import re
   import mechanize
   import cookielib
   
   # Browser
   br = mechanize.Browser()
   
  # Cookie Jar
  cj = cookielib.LWPCookieJar()
  br.set_cookiejar(cj)
  
  # Browser options
  br.set_handle_equiv(True)
  br.set_handle_gzip(True)
  br.set_handle_redirect(True)
  br.set_handle_referer(True)
  br.set_handle_robots(False)
  
  # Follows refresh 0 but not hangs on refresh > 0
  br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
  
  # User-Agent (this is cheating, ok?)
  br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
  
  # The site we will navigate into, handling it's session
  br.open('http://www.cleanmetrics.net/foodcarbonscope')
  
  br.select_form(nr=0)
  br.form['ctl00$ContentPlaceHolder1$userName'] = "XXXXX"
  br.form['ctl00$ContentPlaceHolder1$passWord'] = "XXXXXX"
  
  # Login
  br.submit()

不断收到此错误:

File "scrapeRecipe.py", line 30, in <module>
    br.select_form(nr=0)
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_mechanize.py", line 619, in select_form
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 260, in global_form
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 267, in forms
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 282, in _get_forms
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 247, in root
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 145, in content_parser
ImportError: No module named html5lib

但是,我知道我已经成功安装了 html5lib,因为当我运行时pip3 freeze我看到

html5lib==0.999999999
six==1.10.0
webencodings==0.5.1

最新: 我认为我的问题可能与我的 easy-install.pth 文件有关。在我的站点包目录中,我实际上没有看到 html5lib。我只有这个:

BeautifulSoup-3.2.1-py2.7.egg
appdirs-1.4.3.dist-info
appdirs.py
appdirs.pyc
beautifulsoup4-4.5.3.dist-info
bs4
easy-install.pth
html2text-2016.9.19-py2.7.egg
mechanize-0.3.1-py2.7.egg
packaging
packaging-16.8.dist-info
pip-9.0.1-py2.7.egg
requests-2.13.0-py2.7.egg

当我跑的时候easy_install html5lib,我得到了Adding html5lib 0.999999999 to easy-install.pth file。但是,在它成功完成对 html5lib 的依赖项处理后,我打开了我的 easy_install.pth 文件,却没有看到任何地方提到的 html5lib?

   import sys; sys.__plen = len(sys.path)
   ./BeautifulSoup-3.2.1-py2.7.egg
   ./html2text-2016.9.19-py2.7.egg
   ./mechanize-0.3.1-py2.7.egg
   ./requests-2.13.0-py2.7.egg
   ./pip-9.0.1-py2.7.egg
   import sys; new=sys.path[sys.__plen:]; del sys.path[sys.__plen:]; p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; sys.__egginsert = p+l    en(new)

除非 html5lib 位于上述软件包之一中?我想知道是否需要在我的 python 代码中导入 html5lib 并列出根路径?

真的不知道为什么这会被否决?:/

4

1 回答 1

-1

我现在遇到了一个不同的问题,但这是 html5lib 的解决方案。

pip install --ignore-installed six --user
sudo -H pip install html5lib --ignore-installed

要了解更多信息,这是一个很好的线程:https ://github.com/pypa/pip/issues/3165

于 2017-04-27T16:07:34.603 回答