1

当前代码:

import requests
import pandas as pd
   
url = 'https://docs.anaconda.com/anaconda/user-guide/getting-started/'
html = requests.get(url, verify=False).content
df_list = pd.read_html(html, flavor='bs4')
df = df_list[0]

我想在设置'flavor' arg = 'bs4' 或'html5lib' 时使用pandas.read_html() 函数从页面中提取html。我收到错误:ImportError: html5lib not found,请安装它。

 C:\Users\...\Miniconda3\lib\site-packages\urllib3\connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made to host 'docs.anaconda.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
Traceback (most recent call last):
  File "C:\Users\...\Documents\...\data_scrape.py", line 11, in <module>
    df_list = pd.read_html(html, flavor='bs4')
  File "C:\Users\...\Miniconda3\lib\site-packages\pandas\io\html.py", line 1100, in read_html
    displayed_only=displayed_only,
  File "C:\Users\...\Miniconda3\lib\site-packages\pandas\io\html.py", line 891, in _parse
    parser = _parser_dispatch(flav)
  File "C:\Users\...\Miniconda3\lib\site-packages\pandas\io\html.py", line 840, in _parser_dispatch
    raise ImportError("html5lib not found, please install it")
ImportError: html5lib not found, please install it

但我肯定在环境中安装了 bs4 和 html5lib。运行 conda list 命令后:

conda list
# packages in environment at C:\Users\...\Miniconda3\envs\web_scrape:
#
# Name                    Version                   Build  Channel
beautifulsoup4            4.9.1            py38h32f6830_0    conda-forge
bs4                       4.9.1                         0    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
certifi                   2020.6.20        py38h32f6830_0    conda-forge
html5lib                  1.1                pyh9f0ad1d_0    conda-forge
intel-openmp              2020.1                      216
libblas                   3.8.0                    16_mkl    conda-forge
libcblas                  3.8.0                    16_mkl    conda-forge
libiconv                  1.15             vc14h29686d3_5  [vc14]  anaconda
liblapack                 3.8.0                    16_mkl    conda-forge
libxml2                   2.9.10               h464c3ec_1    anaconda
libxslt                   1.1.34               he774522_0    anaconda
lxml                      4.5.2            py38he3d0fc9_0    conda-forge
mkl                       2020.1                      216
numpy                     1.18.5           py38h72c728b_0    conda-forge
openssl                   1.1.1g               he774522_0    conda-forge
pandas                    1.0.5            py38he6e81aa_0    conda-forge
pip                       20.1.1                     py_1    conda-forge
python                    3.8.3           cpython_h5fd99cc_0    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.8                      1_cp38    conda-forge
pytz                      2020.1             pyh9f0ad1d_0    conda-forge
setuptools                49.2.0           py38h32f6830_0    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
soupsieve                 2.0.1            py38h32f6830_0    conda-forge
sqlite                    3.32.3               he774522_1    conda-forge
vc                        14.1                 h869be7e_1    conda-forge
vs2015_runtime            14.16.27012          h30e32a0_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.34.2                     py_1    conda-forge
wincertstore              0.2                   py38_1003    conda-forge

我不知道为什么 pandas 函数不能识别这些包。有多个其他帖子处理相同的问题,但没有一个解决方案对我有用。

例如,一些类似这样的帖子: Python: ImportError: lxml not found, please install it and

以上答案建议使用 pip3 安装软件包。当我运行这些命令时,我得到以下信息。

pip3 install html5lib
Requirement already satisfied: html5lib in c:\users\...\miniconda3\envs\web_scrape\lib\site-packages (1.1)
Requirement already satisfied: six>=1.9 in c:\users\...\miniconda3\envs\web_scrape\lib\site-packages (from html5lib) (1.15.0)
Requirement already satisfied: webencodings in c:\users\...\miniconda3\envs\web_scrape\lib\site-packages (from html5lib) (0.5.1)

感谢您对类似问题的任何帮助或参考!

谢谢!

4

1 回答 1

2

尝试

conda install -c anaconda html5lib 

我有同样的问题,我不知道它为什么会起作用,但它对我来说效果很好,我在使用 lib lxml 时遇到了同样的问题,我应用了相同的解决方案。我刚刚从 Github 上的帖子中复制了答案

https://github.com/jupyter/notebook/issues/3623

于 2021-04-30T22:02:20.707 回答