3

该代码在 GPU 和 CPU 上运行良好。但是当我使用 keras_to_tpu_model 函数使模型能够在 TPU 上运行时,发生了错误。

这是 colab 的完整输出:https
: //colab.research.google.com/gist/WangHexie/2252beb26f16354cb6e9ba2639970e5b/tpu-error.ipynb 将运行类型更改为 TPU,我认为这可以重现。

github上的代码:https ://github.com/WangHexie/DHNE/blob/master/src/hypergraph_embedding.py#L60

您可以通过切换到 gpu 分支在 GPU 上测试代码。

追溯

    Traceback (most recent call last):
  File "src/hypergraph_embedding.py", line 158, in <module>
    h.train(dataset)
  File "src/hypergraph_embedding.py", line 75, in train
    epochs=self.options.epochs_to_train, verbose=1)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 2177, in fit_generator
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training_generator.py", line 176, in fit_generator
    x, y, sample_weight=sample_weight, class_weight=class_weight)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 1940, in train_on_batch
    outputs = self.train_function(ins)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1238, in __call__
    infeed_manager)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1143, in _tpu_model_ops_for_input_specs
    infeed_manager)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1053, in _specialize_model
    _model_fn, inputs=[[]] * self._tpu_assignment.num_towers)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu.py", line 687, in split_compile_and_replicate
    outputs = computation(*computation_inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 959, in _model_fn
    self.model.cpu_optimizer)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 378, in _clone_optimizer
    config = optimizer.get_config()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/optimizers.py", line 275, in get_config
    'lr': float(K.get_value(self.lr)),
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/backend.py", line 2709, in get_value
    return x.eval(session=get_session())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/backend.py", line 469, in get_session
    _initialize_variables(session)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/backend.py", line 731, in _initialize_variables
    [variables_module.is_variable_initialized(v) for v in candidate_vars])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1137, in _run
    self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 484, in __init__
    self._assert_fetchable(graph, fetch.op)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 497, in _assert_fetchable
    'Operation %r has been marked as not fetchable.' % op.name)
ValueError: Operation u'tpu_140276544043536/VarIsInitializedOp' has been marked as not fetchable.
4

1 回答 1

10

我有同样的问题让我困惑两天。我找到了一个解决方案,只是切换到使用tf.train.RMSPropOptimizer而不是使用 RMSProp 从tensorflow.keras.optimizers.

于 2018-10-24T10:27:06.837 回答