我有一个自定义的 gcp-ai-platform 培训作业脚本,我一直在运行,我上次运行它是在 1 周前。但是,今天相同的训练作业和脚本因错误而失败。
我试图隔离这个问题,当 pandas 试图从谷歌云存储中读取我的训练集 csv 时,我的代码中断了。我的人工智能平台培训工作和谷歌云存储属于同一个项目。
tf = "gs://bucket_name/train.csv"
train_df = pd.read_csv(tf)
The replica master 0 exited with a non-zero status of 1.
Traceback (most recent call last):
[...]
File "/opt/conda/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/conda/lib/python3.7/site-packages/fsspec/__init__.py", line 42, in <module>
entry_points = entry_points()
File "/opt/conda/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 893, in entry_points
return SelectableGroups.load(eps).select(**params)
File "/opt/conda/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 331, in load
ordered = sorted(eps, key=by_group)
File "/opt/conda/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 891, in <genexpr>
dist.entry_points for dist in unique(distributions())
File "/opt/conda/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 517, in entry_points
return EntryPoints._from_text_for(self.read_text('entry_points.txt'), self)
File "/opt/conda/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 244, in _from_text_for
return cls(ep._for(dist) for ep in cls._from_text(text))
File "/opt/conda/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 244, in <genexpr>
return cls(ep._for(dist) for ep in cls._from_text(text))
File "/opt/conda/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 255, in <genexpr>
for name, value in values
ValueError: not enough values to unpack (expected 2, got 1)
我还注意到在训练作业的初始化过程中日志存在差异,额外的日志行如下:
Using mount point: /gcs
Opening GCS connection...
Set up root directory for all accessible buckets
Mounting file system "gcsfuse"
File system has been successfully mounted.
我不确定这些将如何改变 pandas read_csv 的方式。请帮忙。谢谢。