我想在 Jupyter 笔记本中使用 huggingface 数据集库。
这应该像安装它(pip install datasets
,在 venv 中的 bash 中)并导入它(import datasets
在 Python 或笔记本中)一样简单。
当我在标准的 Python 交互式 shell 中测试它时,一切正常,但是,在 Jupyter 笔记本中尝试时,它说:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-6-652e886d387f> in <module>
----> 1 import datasets
ModuleNotFoundError: No module named 'datasets'
起初,我认为可能是笔记本内核使用了不同的虚拟环境,但我从笔记本内部验证了该软件包已安装:
!pip install datasets
Requirement already satisfied: datasets in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (1.8.0)
Requirement already satisfied: numpy>=1.17 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (1.21.0)
Requirement already satisfied: xxhash in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (2.0.2)
Requirement already satisfied: pyarrow<4.0.0,>=1.0.0 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (3.0.0)
Requirement already satisfied: pandas in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (1.2.5)
Requirement already satisfied: fsspec in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (2021.6.1)
Requirement already satisfied: packaging in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (20.9)
Requirement already satisfied: dill in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (0.3.4)
Requirement already satisfied: requests>=2.19.0 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (2.25.1)
Requirement already satisfied: tqdm<4.50.0,>=4.27 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (4.49.0)
Requirement already satisfied: multiprocess in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (0.70.12.2)
Requirement already satisfied: huggingface-hub<0.1.0 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (0.0.13)
Requirement already satisfied: pytz>=2017.3 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from pandas->datasets) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from pandas->datasets) (2.8.1)
Requirement already satisfied: pyparsing>=2.0.2 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from packaging->datasets) (2.4.7)
Requirement already satisfied: certifi>=2017.4.17 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2021.5.30)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (1.26.6)
Requirement already satisfied: idna<3,>=2.5 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2.10)
Requirement already satisfied: typing-extensions in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from huggingface-hub<0.1.0->datasets) (3.10.0.0)
Requirement already satisfied: filelock in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from huggingface-hub<0.1.0->datasets) (3.0.12)
Requirement already satisfied: six>=1.5 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.16.0)
和
!pip freeze
certifi==2021.5.30
chardet==4.0.0
datasets==1.8.0
dill==0.3.4
filelock==3.0.12
fsspec==2021.6.1
huggingface-hub==0.0.13
idna==2.10
multiprocess==0.70.12.2
numpy==1.21.0
packaging==20.9
pandas==1.2.5
pyarrow==3.0.0
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2021.1
requests==2.25.1
six==1.16.0
tqdm==4.49.0
typing-extensions==3.10.0.0
urllib3==1.26.6
xxhash==2.0.2
有任何想法吗?我需要以特殊方式配置笔记本,还是数据集模块有问题?谢谢!
编辑:按照下面的答案,这会使错误消失:
datasets_dir=r"/home/yoga/venvs/text_embeddings/lib/python3.8/site-packages/datasets"
import sys
sys.path.append(datasets_dir)
import datasets
但是有没有一种方法可以在不明确设置这条路径的情况下工作?(或者有人可以解释为什么这里有必要吗?)