我正在使用两个 lmdb 输入来识别脸部的眼睛、鼻尖和嘴巴区域。数据 lmdb 的维度为N
x3x H
x W
,而标签 lmdb 的维度为N
x1x H
/4x W
/4。标签图像是通过在初始化为全 0 的 opencv Mat 上使用数字 1-4 屏蔽区域来创建的(因此总共有 5 个标签,其中 0 是背景标签)。我将标签图像缩小为相应图像的宽度和高度的 1/4,因为我的网络中有 2 个池化层。这种缩小可确保标签图像尺寸与最后一个卷积层的输出相匹配。
我的 train_val.prototxt:
name: "facial_keypoints"
layer {
name: "images"
type: "Data"
top: "images"
include {
phase: TRAIN
}
transform_param {
mean_file: "../mean.binaryproto"
}
data_param {
source: "../train_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "labels"
type: "Data"
top: "labels"
include {
phase: TRAIN
}
data_param {
source: "../train_label_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "images"
type: "Data"
top: "images"
include {
phase: TEST
}
transform_param {
mean_file: "../mean.binaryproto"
}
data_param {
source: "../test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "labels"
type: "Data"
top: "labels"
include {
phase: TEST
}
data_param {
source: "../test_label_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "images"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.0001
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "pool1"
top: "pool1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 64
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layer {
name: "conv_last"
type: "Convolution"
bottom: "pool2"
top: "conv_last"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 5
pad: 2
kernel_size: 5
stride: 1
weight_filler {
#type: "xavier"
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv_last"
top: "conv_last"
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "conv_last"
bottom: "labels"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "conv_last"
bottom: "labels"
top: "loss"
}
在最后一个卷积层中,我将输出大小设置为 5,因为我有 5 个标签类。训练收敛,最终损失约为 0.3,准确度为 0.9(尽管一些消息来源表明这种准确度没有正确测量多标签)。使用经过训练的模型时,输出层正确地生成了一个尺寸为 1x5x H
/4x W
/4 的 blob,我设法将其可视化为 5 个单独的单通道图像。然而,虽然第一张图像正确突出了背景像素,但其余 4 张图像看起来几乎相同,所有 4 个区域都突出显示。
5 个输出通道的可视化(强度从蓝色增加到红色):
原始图像(同心圆标记了每个通道的最高强度。有些更大只是为了与其他通道区分开来。正如您所看到的,除了背景之外,其余通道几乎在同一个嘴巴区域具有最高的激活,这不应该是这种情况。 )
有人可以帮我找出我犯的错误吗?
谢谢。