deep-learning - PyTorch 中带有 Sequential 模块的简单 LSTM

Question

在 PyTorch 中，我们可以通过多种方式定义架构。在这里，我想使用该Sequential模块创建一个简单的 LSTM 网络。

在 Lua 的火炬中，我通常会选择：

model = nn.Sequential()
model:add(nn.SplitTable(1,2))
model:add(nn.Sequencer(nn.LSTM(inputSize, hiddenSize)))
model:add(nn.SelectTable(-1)) -- last step of output sequence
model:add(nn.Linear(hiddenSize, classes_n))

但是，在 PyTorch 中，我找不到SelectTable获得最后一个输出的等价物。

nn.Sequential(
  nn.LSTM(inputSize, hiddenSize, 1, batch_first=True),
  # what to put here to retrieve last output of LSTM ?,
  nn.Linear(hiddenSize, classe_n))

score 3 · Accepted Answer

首先，我让 i 类提取最后一个单元格输出，如下所示

class extractlastcell(nn.Module):
def forward(self,x):
    out , _ = x
    return out[:, -1, :]

当我想在你的例子中使用它时，它会是这样的

nn.Sequential(
nn.LSTM(inputSize, hiddenSize, 1, batch_first=True),
extractlastcell(),
nn.Linear(hiddenSize, classe_n))

score 2 · Accepted Answer

根据LSTM 单元文档，输出参数的形状为 (seq_len, batch, hidden_size * num_directions)，因此您可以通过这种方式轻松获取序列的最后一个元素：

rnn = nn.LSTM(10, 20, 2)
input = Variable(torch.randn(5, 3, 10)) 
h0 = Variable(torch.randn(2, 3, 20))
c0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, (h0, c0))
print(output[-1]) # last element

PyTorch 中的张量操作和神经网络设计比 Torch 中的要容易得多，因此您很少需要使用容器。事实上，正如前 Torch 用户教程 PyTorch中所述， PyTorch 是围绕 Autograd 构建的，因此您不再需要担心容器。但是，如果您想使用旧的 Lua Torch 代码，可以查看Legacy 包。

score 0 · Accepted Answer

就我而言，PyTorch 中没有 aSplitTable或 a之类的东西。SelectTable也就是说，您可以在单个架构中连接任意数量的模块或块，并且您可以使用此属性来检索某个层的输出。让我们用一个简单的例子更清楚地说明这一点。

假设我想构建一个简单的两层 MLP 并检索每一层的输出。我可以构建一个自定义class继承自nn.Module：

class MyMLP(nn.Module):

    def __init__(self, in_channels, out_channels_1, out_channels_2):
        # first of all, calling base class constructor
        super().__init__()
        # now I can build my modular network
        self.block1 = nn.Linear(in_channels, out_channels_1)
        self.block2 = nn.Linear(out_channels_1, out_channels_2)

    # you MUST implement a forward(input) method whenever inheriting from nn.Module
    def forward(x):
        # first_out will now be your output of the first block
        first_out = self.block1(x)
        x = self.block2(first_out)
        # by returning both x and first_out, you can now access the first layer's output
        return x, first_out

在您的主文件中，您现在可以声明自定义架构并使用它：

from myFile import MyMLP
import numpy as np

in_ch = out_ch_1 = out_ch_2 = 64
# some fake input instance
x = np.random.rand(in_ch)

my_mlp = MyMLP(in_ch, out_ch_1, out_ch_2)
# get your outputs
final_out, first_layer_out = my_mlp(x)

此外，您可以在更复杂的模型定义中连接两个 MyMLP，并以类似的方式检索每个 MyMLP 的输出。我希望这足以澄清，但如果您有更多问题，请随时提问，因为我可能遗漏了一些东西。

deep-learning - PyTorch 中带有 Sequential 模块的简单 LSTM

3 回答 3

Related

Reference