我正在大致按照教程在 Google Cloud AI 平台上训练 TensorFlow 估计器。
我想访问一个包含我的训练和评估数据的目录,为此我将我的数据文件递归复制到 Google 存储,如下所示:
gsutil cp -r data gs://name-of-my-bucket/data
这工作正常,并gsutil ls gs://name-of-my-bucket/data
正确返回:
gs://name-of-my-bucket/data/test.json
gs://name-of-my-bucket/data/test
gs://name-of-my-bucket/data/train
但是,从 Python 脚本调用os.listdir(data_dir)
会引发我迄今为止尝试过FileNotFoundError
的任何值,包括and 。为什么?data_dir
'data/'
'name-of-my-bucket/data/'
我知道我的 Python 脚本正在从目录执行。/root/.local/lib/python3.7/site-packages/trainer/
/user_dir
出现问题的 Python 代码(编辑)
这是出现错误的行之前的代码,直接来自__main__
我的 Python 脚本部分:
PARSER = argparse.ArgumentParser()
PARSER.add_argument('--job-dir', ...)
PARSER.add_argument('--eval-steps', ...)
PARSER.add_argument('--export-format', ...)
ARGS = PARSER.parse_args()
tf.logging.set_verbosity('INFO')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = str(tf.logging.__dict__['INFO'] / 10)
HPARAMS = hparam.HParams(**ARGS.__dict__)
这是出现错误的代码行(在我上面报告的代码行之后立即调用的单独函数的第一行):
mug_dirs = [f for f in os.listdir(image_dir) if not f.startswith('.')]
日志(编辑)
我的这项工作的日志是一个信息列表(加上 5 个与 TensorFlow 相关的弃用警告),然后是来自任务的错误:master-replica-0
Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.7/site-packages/trainer/final_task.py", line 114, in <module> train_model(HPARAMS) File "/root/.local/lib/python3.7/site-packages/trainer/final_task.py", line 55, in train_model (train_data, train_labels) = data.create_data_with_labels("data/train/") File "/root/.local/lib/python3.7/site-packages/trainer/data.py", line 13, in create_data_with_labels mug_dirs = [f for f in os.listdir(image_dir) if not f.startswith('.')] FileNotFoundError: [Errno 2] No such file or directory: 'data/train/'
...随后是来自同一任务的另一个错误(从我的 Python 命令报告非零退出状态),然后是关于清理的两个信息,最后是来自任务的错误:service
The replica master 0 exited with a non-zero status of 1. Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.7/site-packages/trainer/final_task.py", line 114, in <module> train_model(HPARAMS) File "/root/.local/lib/python3.7/site-packages/trainer/final_task.py", line 55, in train_model (train_data, train_labels) = data.create_data_with_labels("data/train/") File "/root/.local/lib/python3.7/site-packages/trainer/data.py", line 13, in create_data_with_labels mug_dirs = [f for f in os.listdir(image_dir) if not f.startswith('.')] FileNotFoundError: [Errno 2] No such file or directory: 'data/train/' To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=1047296516162&resource=ml_job%2Fjob_id%2Fml6_run_25&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22ml6_run_25%22