1

我正在训练 tf slim

https://github.com/tensorflow/models/tree/master/slim

从头开始训练模型。发生了某种错误

我认为这是一种 gpu 和 cpu 运行问题。

其他代码对我来说很好。

但是这个发生的错误

我运行以下代码

python train_image_classifier.py 
    --train_dir= /home/sk/workspace/slim/datasets/log
    --dataset_name=imagenet 
    --dataset_split_name=train 
    --dataset_dir=/home/sk/workspace/slim/datasets/imagenet 
    --model_name=inception_v3

错误是

Caused by op u'InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1', defined at:
  File "/home/sk/workspace/slim/train_image_classifier.py", line 573, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/sk/workspace/slim/train_image_classifier.py", line 539, in main
    global_step=global_step)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
    self._create_slots([_get_variable_for(v) for v in var_list])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/rmsprop.py", line 103, in _create_slots
    self._zeros_slot(v, "momentum", self._name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
    named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
    dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 714, in _get_single_variable
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 197, in __init__
    expected_shape=expected_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 281, in _init_from_args
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 128, in variable_op_v2
    shared_name=shared_name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 708, in _variable_v2
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
Colocation Debug Info:
Colocation group had the following types and devices: 
ApplyRMSProp: CPU 
Const: CPU 
Assign: CPU 
IsVariableInitialized: CPU 
Identity: CPU 
VariableV2: CPU 
     [[Node: InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1 = VariableV2[_class=["loc:@InceptionV3/Logits/Conv2d_1c_1x1/biases"], container="", dtype=DT_FLOAT, shape=[3], shared_name="", _device="/device:GPU:0"]()]]


Process finished with exit code 1
4

1 回答 1

0

它试图在 GPU 上运行一些操作,但 TensorFlow 看不到 GPU 设备(因为您使用的是 TensorFlow 的 CPU 版本,或者因为 CUDA 安装问题,或者因为没有 GPU)。看起来您可以指定--clone_on_cpu=True使用 CPU。

于 2017-06-29T00:23:23.997 回答