python - TensorFlow 中的计划采样

Question

最新的关于 seq2seq 模型的 Tensorflow api 包含了定时采样：

https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/ScheduledEmbeddingTrainingHelper https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/ScheduledOutputTrainingHelper

预定采样的原始论文可以在这里找到： https ://arxiv.org/abs/1506.03099

我阅读了论文，但我无法理解和之间的ScheduledEmbeddingTrainingHelper区别ScheduledOutputTrainingHelper。文档只说ScheduledEmbeddingTrainingHelper是一个添加计划采样ScheduledOutputTrainingHelper的训练助手，而是一个直接将计划采样添加到输出的训练助手。

我想知道这两个助手之间有什么区别？

score 10 · Accepted Answer

我联系了这背后的工程师，他回答说：

输出采样器在该时间步发出原始 rnn 输出或原始地面实况。嵌入采样器将 rnn 输出视为分布的 logits，并在该时间步发出来自该分类分布的采样 id 的嵌入查找或原始地面实况。

score 5 · Accepted Answer

ScheduledEmbeddingTrainingHelper这是使用 TensorFlow 1.3 和一些更高级别的 tf.contrib API的基本示例。这是一个 sequence2sequence 模型，其中解码器的初始隐藏状态是编码器的最终隐藏状态。它仅显示如何在单个批次上进行训练（显然任务是“反转此序列”）。对于实际的训练任务，我建议查看 tf.contrib.learn API，例如 learn_runner、Experiment 和 tf.estimator.Estimator。

import tensorflow as tf
import numpy as np
from tensorflow.python.layers.core import Dense

vocab_size = 7
embedding_size = 5
lstm_units = 10

src_batch = np.array([[1, 2, 3], [4, 5, 6]])
trg_batch = np.array([[3, 2, 1], [6, 5, 4]])

# *_seq will have shape (2, 3), *_seq_len will have shape (2)
source_seq = tf.placeholder(shape=(None, None), dtype=tf.int32)
target_seq = tf.placeholder(shape=(None, None), dtype=tf.int32)
source_seq_len = tf.placeholder(shape=(None,), dtype=tf.int32)
target_seq_len = tf.placeholder(shape=(None,), dtype=tf.int32)

# add Start of Sequence (SOS) tokens to each sequence
batch_size, sequence_size = tf.unstack(tf.shape(target_seq))
sos_slice = tf.zeros([batch_size, 1], dtype=tf.int32) # 0 = start of sentence token
decoder_input = tf.concat([sos_slice, target_seq], axis=1)

embedding_matrix = tf.get_variable(
    name="embedding_matrix",
    shape=[vocab_size, embedding_size],
    dtype=tf.float32)
source_seq_embedded = tf.nn.embedding_lookup(embedding_matrix, source_seq) # shape=(2, 3, 5)
decoder_input_embedded = tf.nn.embedding_lookup(embedding_matrix, decoder_input) # shape=(2, 4, 5)

unused_encoder_outputs, encoder_state = tf.nn.dynamic_rnn(
    tf.contrib.rnn.LSTMCell(lstm_units),
    source_seq_embedded,
    sequence_length=source_seq_len,
    dtype=tf.float32)

# Decoder:
# At each time step t and for each sequence in the batch, we get x_t by either
#   (1) sampling from the distribution output_layer(t-1), or
#   (2) reading from decoder_input_embedded.
# We do (1) with probability sampling_probability and (2) with 1 - sampling_probability.
# Using sampling_probability=0.0 is equivalent to using TrainingHelper (no sampling).
# Using sampling_probability=1.0 is equivalent to doing inference,
# where we don't supervise the decoder at all: output at t-1 is the input at t.
sampling_prob = tf.Variable(0.0, dtype=tf.float32)
helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(
    decoder_input_embedded,
    target_seq_len,
    embedding_matrix,
    sampling_probability=sampling_prob)

output_layer = Dense(vocab_size)
decoder = tf.contrib.seq2seq.BasicDecoder(
    tf.contrib.rnn.LSTMCell(lstm_units),
    helper,
    encoder_state,
    output_layer=output_layer)

outputs, state, seq_len = tf.contrib.seq2seq.dynamic_decode(decoder)
loss = tf.contrib.seq2seq.sequence_loss(
    logits=outputs.rnn_output,
    targets=target_seq,
    weights=tf.ones(trg_batch.shape))

train_op = tf.contrib.layers.optimize_loss(
    loss=loss,
    global_step=tf.contrib.framework.get_global_step(),
    optimizer=tf.train.AdamOptimizer,
    learning_rate=0.001)

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    _, _loss = session.run([train_op, loss], {
        source_seq: src_batch,
        target_seq: trg_batch,
        source_seq_len: [3, 3],
        target_seq_len: [3, 3],
        sampling_prob: 0.5
    })
    print("Loss: " + str(_loss))

对于ScheduledOutputTrainingHelper，我希望只换掉助手并使用：

helper = tf.contrib.seq2seq.ScheduledOutputTrainingHelper(
    target_seq,
    target_seq_len,
    sampling_probability=sampling_prob)

However this gives an error, since the LSTM cell expects a multidimensional input per timestep (of shape (batch_size, input_dims)). I will raise an issue in GitHub to find out if this is a bug, or there's some other way to use ScheduledOutputTrainingHelper.

score 0 · Accepted Answer

This might also help you. This is for the case where you want to do scheduled sampling at each decoding step separately.

import tensorflow as tf
import numpy as np
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import gen_array_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops.distributions import categorical
from tensorflow.python.ops.distributions import bernoulli
batch_size = 64
vocab_size = 50000
emb_dim = 128
output = tf.get_variable('output', 
initializer=tf.constant(np.random.rand(batch_size,vocab_size)))
base_next_inputs = tf.get_variable('input', 
initializer=tf.constant(np.random.rand(batch_size,emb_dim)))
embedding = tf.get_variable('embedding', 
initializer=tf.constant(np.random.rand(vocab_size,emb_dim)))
select_sampler = bernoulli.Bernoulli(probs=0.99, dtype=tf.bool)
select_sample = select_sampler.sample(sample_shape=batch_size, 
seed=123)
sample_id_sampler = categorical.Categorical(logits=output)
sample_ids = array_ops.where(
    select_sample,
    sample_id_sampler.sample(seed=123),
    gen_array_ops.fill([batch_size], -1))

where_sampling = math_ops.cast(
   array_ops.where(sample_ids > -1), tf.int32)
where_not_sampling = math_ops.cast(
   array_ops.where(sample_ids <= -1), tf.int32)
sample_ids_sampling = array_ops.gather_nd(sample_ids, where_sampling)
inputs_not_sampling = array_ops.gather_nd(base_next_inputs, 
     where_not_sampling)
sampled_next_inputs = tf.nn.embedding_lookup(embedding, 
    sample_ids_sampling)
base_shape = array_ops.shape(base_next_inputs)
result1 = array_ops.scatter_nd(indices=where_sampling, 
   updates=sampled_next_inputs, shape=base_shape)
result2 = array_ops.scatter_nd(indices=where_not_sampling, 
   updates=inputs_not_sampling, shape=base_shape)
result = result1 + result2

I used the tensorflow documentation code to make this example. https://github.com/tensorflow/tensorflow/blob/r1.5/tensorflow/contrib/seq2seq/python/ops/helper.py

python - TensorFlow 中的计划采样

3 回答 3

Related

Reference