5

我执行:

gcloud beta ml jobs submit training ${JOB_NAME} --config config.yaml

大约 5 分钟后,作业出现此错误:

Traceback (most recent call last): 
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) 
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 232, in <module> tf.app.run() 
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 228, in main run_training() 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 129, in run_training data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data) 
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 212, in read_data_sets with open(local_file, 'rb') as f: IOError: [Errno 2] No such file or directory: 'gs://my-bucket/mnist/train/train-images.gz'

奇怪的是,据我所知,该文件存在于该 url。

4

2 回答 2

3

此错误通常表明您正在使用多区域 GCS 存储桶进行输出。为避免此错误,您应该使用区域 GCS 存储桶。区域存储桶提供了更强的一致性保证,这是避免这些类型的错误所必需的。

有关为 Cloud ML 正确设置 GCS 存储桶的更多信息,请参阅Cloud ML 文档

于 2016-09-29T16:23:44.057 回答
1

普通 IO 不知道如何正确处理 GCS gs://。你需要:

first_data_file = args.train_files[0]
file_stream = file_io.FileIO(first_data_file, mode='r')

# run experiment
model.run_experiment(file_stream)

但具有讽刺意味的是,您可以将文件从 gs://bucket 移动到您的根目录,然后您的程序可以实际看到:

with file_io.FileIO(gs://presentation_mplstyle_path, mode='r') as input_f:
    with file_io.FileIO('presentation.mplstyle', mode='w+') as output_f:
        output_f.write(input_f.read())

mpl.pyplot.style.use(['./presentation.mplstyle'])

最后,将文件从根目录移回 gs://bucket:

with file_io.FileIO(report_name, mode='r') as input_f:
    with file_io.FileIO(job_dir + '/' + report_name, mode='w+') as output_f:
        output_f.write(input_f.read())

应该更容易国际海事组织。

于 2017-06-25T20:16:01.730 回答