tensorflow - Tensorflow Estimator API：使用动态 batch_size 记住前一批的 LSTM 状态以用于下一批

Question

我知道类似的问题已经在 stackoverflow 和互联网上被问过好几次了，但我只是无法找到以下问题的解决方案：我正在尝试在 tensorflow 及其Estimator API中构建一个有状态的 LSTM 模型. 我尝试了Tensorflow 的解决方案，在 RNN 中保存状态的最佳方法？，只要我使用的是静态的，它就可以工作batch_size。具有动态 batch_size 会导致以下问题：

ValueError: initial_value 必须具有指定的形状：Tensor("DropoutWrapperZeroState/MultiRNNCellZeroState/DropoutWrapperZeroState/LSTMCellZeroState/zeros:0", shape=(?, 200), dtype=float32)

设置tf.Variable(...., validate_shape=False)只是将问题移到图表的下方：

Traceback (most recent call last):
  File "model.py", line 576, in <module>
    tf.app.run(main=run_experiment)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "model.py", line 137, in run_experiment
    hparams=params  # HParams
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run
    return _execute_schedule(experiment, schedule)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule
    return task()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 495, in train_and_evaluate
    self.train(delay_secs=0)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
    hooks=self._train_monitors + extra_hooks)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 660, in _call_train
    hooks=hooks)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 241, in train
    loss = self._train_model(input_fn=input_fn, hooks=hooks)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 560, in _train_model
    model_fn_lib.ModeKeys.TRAIN)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 545, in _call_model_fn
    features=features, labels=labels, **kwargs)
  File "model.py", line 218, in model_fn
    output, state = get_model(features, params)
  File "model.py", line 567, in get_model
    model = lstm(inputs, params)
  File "model.py", line 377, in lstm
    output, new_states = tf.nn.dynamic_rnn(multicell, inputs=inputs, initial_state = states)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 574, in dynamic_rnn
    dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 737, in _dynamic_rnn_loop
    swap_memory=swap_memory)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2770, in while_loop
    result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2599, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2549, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 722, in _time_step
    (output, new_state) = call_cell()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 708, in <lambda>
    call_cell = lambda: cell(input_t, state)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 752, in __call__
    output, new_state = self._cell(inputs, state, scope)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 180, in __call__
    return super(RNNCell, self).__call__(inputs, state)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 441, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 916, in call
    cur_inp, new_state = cell(cur_inp, cur_state)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 752, in __call__
    output, new_state = self._cell(inputs, state, scope)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 180, in __call__
    return super(RNNCell, self).__call__(inputs, state)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 441, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 542, in call
    lstm_matrix = _linear([inputs, m_prev], 4 * self._num_units, bias=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1002, in _linear
    raise ValueError("linear is expecting 2D arguments: %s" % shapes)
ValueError: linear is expecting 2D arguments: [TensorShape([Dimension(None), Dimension(62)]), TensorShape(None)]

根据github issue 2838，无论如何都不建议使用不可训练的变量（???），这就是我继续寻找其他解决方案的原因。

现在我在我的model_fn：

def rnn_placeholders(state):
    """Convert RNN state tensors to placeholders with the zero state as default."""
    if isinstance(state, tf.contrib.rnn.LSTMStateTuple):
        c, h = state
        c = tf.placeholder_with_default(c, c.shape, c.op.name)
        h = tf.placeholder_with_default(h, h.shape, h.op.name)
        return tf.contrib.rnn.LSTMStateTuple(c, h)
    elif isinstance(state, tf.Tensor):
        h = state
        h = tf.placeholder_with_default(h, h.shape, h.op.name)
        return h
    else:
        structure = [rnn_placeholders(x) for x in state]
        return tuple(structure)


state = rnn_placeholders(cell.zero_state(batch_size, tf.float32))

for tensor in flatten(state):
    tf.add_to_collection('rnn_state_input', tensor)

x, new_state = tf.nn.dynamic_rnn(...)

for tensor in flatten(new_state):
    tf.add_to_collection('rnn_state_output', tensor)

但不幸的是，在使用API 等时，我不知道如何使用占位符new_state将其值反馈给state每次迭代的占位符。由于我对 Tensorflow 很陌生，我认为我在这里缺乏概念知识。是否可以使用自定义？：tf.EstimatorSessionRunHook

class UpdateHook(tf.train.SessionRunHook):

        def before_run(self, run_context):
            run_args = super(UpdateHook, self).before_run(run_context)
            run_args = tf.train.SessionRunArgs(new_state)

            #print(run_args)
            return run_args

        def after_run(self, run_context, run_values):
            #run_values gives the actual value of new_state.
            # How to update now the state placeholder??

有没有人知道如何解决这个问题？非常感谢提示和技巧！！！非常感谢！

PS：如果有不清楚的地方请告诉我；）

编辑：不幸的是，我正在使用新的 tf.data API，不能 StateSavingRNNEstimator像 Eugene 建议的那样使用。

score 1 · Accepted Answer

有一个估算器可以让您的代码基于它使用batch_sequences_with_states。它被称为StateSavingRNNEstimator。除非您使用的是新的tf.contrib.data/ tf.dataAPI，否则它应该足以让您入门。

score 1 · Accepted Answer

这个答案可能会迟到。几个月前我遇到了类似的问题。我使用定制的 SessionRunHook 解决了它。它在性能方面可能并不完美，但您可以尝试一下。

class LSTMStateHook(tf.train.SessionRunHook):

 def __init__(self, params):
    self.init_states  = None
    self.current_state = np.zeros((params.rnn_layers, 2, params.batch_size, params.state_size))

 def before_run(self, run_context):
    run_args = tf.train.SessionRunArgs([tf.get_default_graph().get_tensor_by_name('LSTM/output_states:0')],{self.init_states:self.current_state,},)
    return run_args

 def after_run(self, run_context, run_values):
    self.current_state = run_values[0][0] //depends on your session run arguments!!!!!!!


 def begin(self):
    self.init_states = tf.get_default_graph().get_tensor_by_name('LSTM/init_states:0')

在您定义 lstm 图的代码中，您需要以下内容：

if self.stateful is True:
        init_states = multicell.zero_state(self.batch_size, tf.float32)
        init_states = tf.identity(init_states, "init_states")

        l = tf.unstack(init_states, axis=0)
        rnn_tuple_state = tuple([tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1]) for idx in range(self.rnn_layers)])

    else:
        rnn_tuple_state = multicell.zero_state(self.batch_size, tf.float32)

# Unroll RNN
output, output_states = tf.nn.dynamic_rnn(multicell, inputs=inputs, initial_state = rnn_tuple_state)

if self.stateful is True:
  output_states = tf.identity(output_states, "output_states")
  return output

tensorflow - Tensorflow Estimator API：使用动态 batch_size 记住前一批的 LSTM 状态以用于下一批

2 回答 2

Related

Reference