我正在尝试节省一些磁盘空间以在 Google Colab 上使用 CommonVoice French 数据集 (19G),因为我的笔记本总是因磁盘空间不足而崩溃。我从HuggingFace文档中看到,我们可以以流模式加载数据集,这样我们就可以iterate over it directly without having to download the entire dataset.
。我尝试在 Google Colab 中使用该模式,但无法使其工作 - 而且我还没有找到任何关于此问题的信息。
!pip install datasets
!pip install 'datasets[streaming]'
!pip install aiohttp
common_voice_train = load_dataset("common_voice", "fr", split="train", streaming=True)
然后,我收到以下错误:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-24-489f8a0ca4e4> in <module>()
----> 1 common_voice_train = load_dataset("common_voice", "fr", split="train", streaming=True)
/usr/local/lib/python3.7/dist-packages/datasets/load.py in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, ignore_verifications, keep_in_memory, save_infos, script_version, use_auth_token, task, streaming, **config_kwargs)
811 if not config.AIOHTTP_AVAILABLE:
812 raise ImportError(
--> 813 f"To be able to use dataset streaming, you need to install dependencies like aiohttp "
814 f'using "pip install \'datasets[streaming]\'" or "pip install aiohttp" for instance'
815 )
ImportError: To be able to use dataset streaming, you need to install dependencies like aiohttp using "pip install 'datasets[streaming]'" or "pip install aiohttp" for instance
---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------
Google Colab 不允许流式加载数据集有什么原因吗?
否则,我错过了什么?