machine-learning - LSTM 后跟均值池化

Question

我正在使用 Keras 1.0。我的问题与这个问题相同（如何在 Keras 中实现平均池化层），但那里的答案对我来说似乎还不够。

我想实现这个网络：

以下代码不起作用：

sequence = Input(shape=(max_sent_len,), dtype='int32')
embedded = Embedding(vocab_size, word_embedding_size)(sequence)
lstm = LSTM(hidden_state_size, activation='sigmoid', inner_activation='hard_sigmoid', return_sequences=True)(embedded)
pool = AveragePooling1D()(lstm)
output = Dense(1, activation='sigmoid')(pool)

如果我不设置return_sequences=True，我在调用时会收到此错误AveragePooling1D()：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/PATH/keras/engine/topology.py", line 462, in __call__
    self.assert_input_compatibility(x)
  File "/PATH/keras/engine/topology.py", line 382, in assert_input_compatibility
    str(K.ndim(x)))
Exception: ('Input 0 is incompatible with layer averagepooling1d_6: expected ndim=3', ' found ndim=2')

否则，我在调用时会收到此错误Dense()：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/PATH/keras/engine/topology.py", line 456, in __call__
    self.build(input_shapes[0])
  File "/fs/clip-arqat/mossaab/trec/liveqa/cmu/venv/lib/python2.7/site-packages/keras/layers/core.py", line 512, in build
    assert len(input_shape) == 2
AssertionError

score 9 · Accepted Answer

我只是尝试实现与原始海报相同的模型，并且我正在使用Keras 2.0.3. 当我使用 LSTM 后的平均池化工作时GlobalAveragePooling1D，只需确保return_sequences=True在 LSTM 层中。试试看！

score 4 · Accepted Answer

我认为接受的答案基本上是错误的。在以下位置找到了解决方案： https ://github.com/fchollet/keras/issues/2151 但是，它仅适用于 theano 后端。我已经修改了代码，使其同时支持 theano 和 tensorflow。

from keras.engine.topology import Layer, InputSpec
from keras import backend as T

class TemporalMeanPooling(Layer):
    """
This is a custom Keras layer. This pooling layer accepts the temporal
sequence output by a recurrent layer and performs temporal pooling,
looking at only the non-masked portion of the sequence. The pooling
layer converts the entire variable-length hidden vector sequence
into a single hidden vector, and then feeds its output to the Dense
layer.

input shape: (nb_samples, nb_timesteps, nb_features)
output shape: (nb_samples, nb_features)
"""
def __init__(self, **kwargs):
    super(TemporalMeanPooling, self).__init__(**kwargs)
    self.supports_masking = True
    self.input_spec = [InputSpec(ndim=3)]

def get_output_shape_for(self, input_shape):
    return (input_shape[0], input_shape[2])

def call(self, x, mask=None): #mask: (nb_samples, nb_timesteps)
    if mask is None:
        mask = T.mean(T.ones_like(x), axis=-1)
    ssum = T.sum(x,axis=-2) #(nb_samples, np_features)
    mask = T.cast(mask,T.floatx())
    rcnt = T.sum(mask,axis=-1,keepdims=True) #(nb_samples)
    return ssum/rcnt
    #return rcnt

def compute_mask(self, input, mask):
    return None

score 4 · Accepted Answer

添加TimeDistributed(Dense(1))帮助：

sequence = Input(shape=(max_sent_len,), dtype='int32')
embedded = Embedding(vocab_size, word_embedding_size)(sequence)
lstm = LSTM(hidden_state_size, activation='sigmoid', inner_activation='hard_sigmoid', return_sequences=True)(embedded)
distributed = TimeDistributed(Dense(1))(lstm)
pool = AveragePooling1D()(distributed)
output = Dense(1, activation='sigmoid')(pool)

score 1 · Accepted Answer

谢谢，我也遇到了这个问题，但是我认为 TimeDistributed 层无法按您的意愿工作，您可以尝试 Luke Guye 的 TemporalMeanPooling 层，它对我有用。这是示例：

sequence = Input(shape=(max_sent_len,), dtype='int32')
embedded = Embedding(vocab_size, word_embedding_size)(sequence)
lstm = LSTM(hidden_state_size, return_sequences=True)(embedded)
pool = TemporalMeanPooling()(lstm)
output = Dense(1, activation='sigmoid')(pool)

score 0 · Accepted Answer

派对迟到了，但tf.keras.layers.AveragePooling1D使用合适pool_size的参数似乎也返回了正确的结果。

研究bobchennan在这个问题上分享的例子。

# create sample data
A=np.array([[1,2,3],[4,5,6],[0,0,0],[0,0,0],[0,0,0]])
B=np.array([[1,3,0],[4,0,0],[0,0,1],[0,0,0],[0,0,0]])
C=np.array([A,B]).astype("float32")
# expected answer (for temporal mean)
np.mean(C, axis=1)

输出是

array([[1. , 1.4, 1.8],
       [1. , 0.6, 0.2]], dtype=float32)

现在使用AveragePooling1D,

model = keras.models.Sequential(
        tf.keras.layers.AveragePooling1D(pool_size=5)
)
model.predict(C)

输出是，

array([[[1. , 1.4, 1.8]],
       [[1. , 0.6, 0.2]]], dtype=float32)

需要考虑的几点，

pool_size应该等于循环层的步长/时间步长。
输出的形状是(batch_size, downsampled_steps, features)，它包含一个额外的downsampled_steps维度。pool_size如果您在循环层中设置等于时间步长，这将始终为 1 。

machine-learning - LSTM 后跟均值池化

5 回答 5

Related

Reference