time-series - 如何使用 LSTM 单元训练 RNN 以进行时间序列预测

Question

我目前正在尝试建立一个简单的模型来预测时间序列。目标是使用序列训练模型，以便模型能够预测未来值。

我正在使用 tensorflow 和 lstm 单元来执行此操作。该模型通过时间截断反向传播进行训练。我的问题是如何构建训练数据。

例如，假设我们想学习给定的序列：

[1,2,3,4,5,6,7,8,9,10,11,...]

我们展开网络num_steps=4.

选项1

input data               label     
1,2,3,4                  2,3,4,5
5,6,7,8                  6,7,8,9
9,10,11,12               10,11,12,13
...

选项 2

input data               label     
1,2,3,4                  2,3,4,5
2,3,4,5                  3,4,5,6
3,4,5,6                  4,5,6,7
...

选项 3

input data               label     
1,2,3,4                  5
2,3,4,5                  6
3,4,5,6                  7
...

选项 4

input data               label     
1,2,3,4                  5
5,6,7,8                  9
9,10,11,12               13
...

任何帮助，将不胜感激。

score 5 · Accepted Answer

我正要在 TensorFlow 中学习 LSTM，并尝试实现一个示例，该示例（幸运的是）试图预测一些由简单数学函数生成的时间序列/数字序列。

但是我使用了一种不同的方式来构建训练数据，其动机是使用 LSTM 对视频表示进行无监督学习：

LSTM 未来预测模型

选项 5：

input data               label     
1,2,3,4                  5,6,7,8
2,3,4,5                  6,7,8,9
3,4,5,6                  7,8,9,10
...

除了这篇论文，我（尝试）从给定的 TensorFlow RNN 示例中获取灵感。我当前的完整解决方案如下所示：

import math
import random
import numpy as np
import tensorflow as tf

LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4


def ground_truth_func(i, j, t):
    return i * math.pow(t, 2) + j


def get_batch(batch_size):
    seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
    tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)

    for b in xrange(batch_size):
        i = float(random.randint(-25, 25))
        j = float(random.randint(-100, 100))
        for t in xrange(NUM_T_STEPS):
            value = ground_truth_func(i, j, t)
            seq[b, t, 0] = value

        for t in xrange(NUM_T_STEPS):
            tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
    return seq, tgt


# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])

fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))

# ENCODER
with tf.variable_scope('ENC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
    state = initial_state
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)

learned_representation = state

# DECODER
with tf.variable_scope('DEC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    state = learned_representation
    logits_stacked = None
    loss = 0.0
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)
        # output can be used to make next number prediction
        logits = tf.matmul(output, fc1_weight) + fc1_bias

        if logits_stacked is None:
            logits_stacked = logits
        else:
            logits_stacked = tf.concat(1, [logits_stacked, logits])

        loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE

reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))

train = tf.train.AdamOptimizer().minimize(reg_loss)

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    total_loss = 0.0
    for step in xrange(MAX_STEPS):
        seq_batch, target_batch = get_batch(BATCH_SIZE)

        feed = {sequence: seq_batch, target: target_batch}
        _, current_loss = sess.run([train, reg_loss], feed)
        if step % 10 == 0:
            print("@{}: {}".format(step, current_loss))
        total_loss += current_loss

    print('Total loss:', total_loss)

    print('### SIMPLE EVAL: ###')
    seq_batch, target_batch = get_batch(BATCH_SIZE)
    feed = {sequence: seq_batch, target: target_batch}
    prediction = sess.run([logits_stacked], feed)
    for b in xrange(BATCH_SIZE):
        print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
        print(" `-> Prediction: {}".format(prediction[0][b]))

此示例输出如下所示：

### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
#  `-> Prediction: [model prediction]  
[  33.   53.  113.  213.] -> [  353.   533.   753.  1013.])
 `-> Prediction: [ 19.74548721  28.3149128   33.11489105  35.06603241]
[ -17.  -32.  -77. -152.] -> [-257. -392. -557. -752.])
 `-> Prediction: [-16.38951683 -24.3657589  -29.49801064 -31.58583832]
[ -7.  -4.   5.  20.] -> [  41.   68.  101.  140.])
 `-> Prediction: [ 14.14126873  22.74848557  31.29668617  36.73633194]
...

该模型是一个LSTM 自动编码器，每层有 2 层。

不幸的是，正如您在结果中看到的那样，该模型没有正确学习序列。我可能只是在某个地方犯了一个严重的错误，或者 1000-10000 个训练步骤对于 LSTM 来说只是少数。正如我所说，我也刚刚开始正确理解/使用 LSTM。但希望这能给你一些关于实施的灵感。

score 4 · Accepted Answer

在阅读了几个 LSTM 介绍博客，例如Jakob Aungiers 之后，选项 3 似乎是无状态 LSTM 的正确选择。

如果您的 LSTM 需要记住比您的更早的数据num_steps，您可以以有状态的方式进行训练 - 有关 Keras 的示例，请参阅Philippe Remy 的博客文章“Keras 中的有状态 LSTM”。但是，Philippe 没有展示批量大小大于 1 的示例。我猜在您的情况下，具有状态 LSTM 的批量大小为 4 可以与以下数据一起使用（写为input -> label）：

batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8

batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12

batch #2:
9,10,11,12 -> 13
...

这样，例如批次#0 中的第二个样本的状态被正确地重新使用以继续使用批次#1 的第二个样本进行训练。

这在某种程度上类似于您的选项 4，但是您没有使用那里的所有可用标签。

更新：

作为我的建议 where batch_sizeequals the 的扩展num_steps，Alexis Huet给出了batch_size作为除数的情况的答案num_steps，它可以用于更大的num_steps. 他在他的博客上很好地描述了它。

score 1 · Accepted Answer

我相信选项 1 最接近 /tensorflow/models/rnn/ptb/reader.py 中的参考实现

def ptb_iterator(raw_data, batch_size, num_steps):
  """Iterate on the raw PTB data.

  This generates batch_size pointers into the raw PTB data, and allows
  minibatch iteration along these pointers.

  Args:
    raw_data: one of the raw data outputs from ptb_raw_data.
    batch_size: int, the batch size.
    num_steps: int, the number of unrolls.

  Yields:
    Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
    The second element of the tuple is the same data time-shifted to the
    right by one.

  Raises:
    ValueError: if batch_size or num_steps are too high.
  """
  raw_data = np.array(raw_data, dtype=np.int32)

  data_len = len(raw_data)
  batch_len = data_len // batch_size
  data = np.zeros([batch_size, batch_len], dtype=np.int32)
  for i in range(batch_size):
    data[i] = raw_data[batch_len * i:batch_len * (i + 1)]

  epoch_size = (batch_len - 1) // num_steps

  if epoch_size == 0:
    raise ValueError("epoch_size == 0, decrease batch_size or num_steps")

  for i in range(epoch_size):
    x = data[:, i*num_steps:(i+1)*num_steps]
    y = data[:, i*num_steps+1:(i+1)*num_steps+1]
    yield (x, y)

但是，另一个选项是为每个训练序列随机选择一个指向数据数组的指针。

time-series - 如何使用 LSTM 单元训练 RNN 以进行时间序列预测

3 回答 3

Related

Reference