我最近一直在实现一个基于 OpenPose 的模型。在 OpenPose 中,它使用 VGG 作为其主干模型来提取特征图,但 VGG 包含最大池化层,这会将输出的形状减少到 1/4。下面是 OpenPose 的模型结构:
VGGOpenPose(
(model0): OpenPose_Feature(
(model): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(24): ReLU(inplace=True)
(25): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(26): ReLU(inplace=True)
)
)
(model1_1): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(512, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model2_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model3_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model4_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model5_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model6_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model1_2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(512, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model2_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model3_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model4_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model5_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model6_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
)
在原始论文中,它说groundtruth热图和paf与输入图像具有相同的宽度和高度。 OpenPose:使用部分亲和场的实时多人 2D 姿势估计
我已经在 Python 中搜索了一些 OpenPose 的实现。他们中的大多数使用 element-wise loss function 来计算输出和 groundtruth label 之间的损失,就像论文中提到的函数一样:
我想知道 OpenPose 的输出是否与输入图像的大小不同,以及 OpenPose 是如何计算输出和 groundtruth heatmap/paf 之间的损失函数的?