python - 使用 nn.Identity 进行残差学习的想法是什么？

Question

所以，我已经阅读了大约一半的原始 ResNet 论文，并试图弄清楚如何为表格数据制作我的版本。

我已经阅读了一些关于它如何在 PyTorch 中工作的博客文章，并且我看到大量使用nn.Identity(). 现在，该论文还经常使用术语身份映射。但是，它只是指以元素方式将层堆栈的输入添加到同一堆栈的输出。如果输入和输出维度不同，那么本文将讨论用零填充输入或使用矩阵W_s将输入投影到不同的维度。

这是我在博客文章中找到的残差块的抽象：


class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, activation='relu'):
        super().__init__()
        self.in_channels, self.out_channels, self.activation = in_channels, out_channels, activation
        self.blocks = nn.Identity()
        self.shortcut = nn.Identity()   
    
    def forward(self, x):
        residual = x
        if self.should_apply_shortcut: residual = self.shortcut(x)
        x = self.blocks(x)
        x += residual
        return x
    
    @property
    def should_apply_shortcut(self):
        return self.in_channels != self.out_channels
    
block1 = ResidualBlock(4, 4)

还有我自己对虚拟张量的应用：

x = tensor([1, 1, 2, 2])
block1 = ResidualBlock(4, 4)
block2 = ResidualBlock(4, 6)
x = block1(x)
print(x)
x = block2(x)
print(x)

>>> tensor([2, 2, 4, 4])
>>> tensor([4, 4, 8, 8])

所以最后，x = nn.Identity(x)我不确定它的用途，除了模仿原始论文中的数学术语。我确信情况并非如此，而且它有一些我还没有看到的隐藏用途。会是什么呢？

编辑这是实现残差学习的另一个例子，这次是在 Keras 中。它执行我上面建议的操作，并且只保留输入的副本以添加到输出中：

def residual_block(x: Tensor, downsample: bool, filters: int,                                        kernel_size: int = 3) -> Tensor:
    y = Conv2D(kernel_size=kernel_size,
               strides= (1 if not downsample else 2),
               filters=filters,
               padding="same")(x)
    y = relu_bn(y)
    y = Conv2D(kernel_size=kernel_size,
               strides=1,
               filters=filters,
               padding="same")(y)

    if downsample:
        x = Conv2D(kernel_size=1,
                   strides=2,
                   filters=filters,
                   padding="same")(x)
    out = Add()([x, y])
    out = relu_bn(out)
    return out

score 6 · Accepted Answer

使用 nn.Identity 进行残差学习的想法是什么？

没有（几乎，见帖子的结尾），nn.Identity所做的只是转发给它的输入（基本上no-op）。

如您在评论中链接的PyTorch 回购问题所示，这个想法首先被拒绝，后来由于其他用途而合并到 PyTorch 中（请参阅此 PR 中的基本原理）。这个理由与 ResNet 块本身无关，请参见答案的结尾。

ResNet 实现

我能想到的最简单的投影通用版本将是这样的：

class Residual(torch.nn.Module):
    def __init__(self, module: torch.nn.Module, projection: torch.nn.Module = None):
        super().__init__()
        self.module = module
        self.projection = projection

    def forward(self, inputs):
        output = self.module(inputs)
        if self.projection is not None:
            inputs = self.projection(inputs)
        return output + inputs

您可以传递module两个堆叠卷积之类的东西，并添加1x1卷积（带有填充或跨步或其他东西）作为投影模块。

对于tabular数据，您可以将其用作module（假设您的输入具有50特征）：

torch.nn.Sequential(
    torch.nn.Linear(50, 50),
    torch.nn.ReLU(),
    torch.nn.Linear(50, 50),
    torch.nn.ReLU(),
    torch.nn.Linear(50, 50),
)

基本上，您所要做的就是将input一些模块添加到它的输出中，就是这样。

理性行为`nn.Identity`

构建神经网络（然后再阅读它们）可能更容易，例如批量规范（取自上述 PR）：

batch_norm = nn.BatchNorm2d
if dont_use_batch_norm:
    batch_norm = Identity

现在您可以轻松地使用它nn.Sequential：

nn.Sequential(
    ...
    batch_norm(N, momentum=0.05),
    ...
)

并且在打印网络时，它始终具有相同数量的子模块（使用BatchNormor Identity），这也使整个事情在 IMO 上更加顺畅。

这里提到的另一个用例可能是删除现有神经网络的一部分：

net = tv.models.alexnet(pretrained=True)
# Assume net has two parts
# features and classifier
net.classifier = Identity()

现在，net.features(input)您可以运行而不是运行net(input)，这可能对其他人也更容易阅读。

python - 使用 nn.Identity 进行残差学习的想法是什么？

1 回答 1

ResNet 实现

理性行为nn.Identity

Related

Reference

理性行为`nn.Identity`