matlab - 使用 matlab matconvnet 训练网络

Question

我想使用 matlab 和 matconvnet-1.0-beta25 训练我的网络。我的问题是回归，我使用pdist损失函数来获取 mse。输入数据为56*56*64*6000，目标数据为56*56*64*6000，网络架构如下：

opts.networkType = 'simplenn' ;
opts = vl_argparse(opts, varargin) ;

lr = [.01 2] ;

% Define network CIFAR10-quick
net.layers = {} ;

% Block 1
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.01*randn(5,5,64,32, 'single'), zeros(1, 32, 'single')}}, ...
                           'learningRate', lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.05*randn(5,5,32,16, 'single'), zeros(1,16,'single')}}, ...
                           'learningRate', .1*lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.01*randn(5,5,16,8, 'single'), zeros(1, 8, 'single')}}, ...
                           'learningRate', lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.05*randn(5,5,8,16, 'single'), zeros(1,16,'single')}}, ...
                           'learningRate', .1*lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.01*randn(5,5,16,32, 'single'), zeros(1, 32, 'single')}}, ...
                           'learningRate', lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.05*randn(5,5,32,64, 'single'), zeros(1,64,'single')}}, ...
                           'learningRate', .1*lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
% Loss layer
net.layers{end+1} = struct('type', 'pdist') ;

% Meta parameters
net.meta.inputSize = [56 56 64] ;
net.meta.trainOpts.learningRate = [0.0005*ones(1,30) 0.0005*ones(1,10) 0.0005*ones(1,5)] ;
net.meta.trainOpts.weightDecay = 0.0001 ;
net.meta.trainOpts.batchSize = 100 ;
net.meta.trainOpts.numEpochs = numel(net.meta.trainOpts.learningRate) ;

% Fill in default values

net = vl_simplenn_tidy(net) ;

我将（我的名字）中的getSimpleNNBatch(imdb, batch)功能更改ncnn_train如下：

function [images, labels] = getSimpleNNBatch(imdb, batch)
    images = imdb.images.data(:,:,:,batch) ;
    labels = imdb.images.labels(:,:,:,batch) ;
    if rand > 0.5, images=fliplr(images) ; 
end

因为我的标签是多维的。我也从更改errorFunction为：cnn_trainmulticlassesnone

opts.errorFunction = 'none' ;

并将error变量从：

% accumulate errors
error = sum([error, [...
  sum(double(gather(res(end).x))) ;
  reshape(params.errorFunction(params, labels, res),[],1) ; ]],2) ;

至：

% accumulate errors
error = sum([error, [...
  mean(mean(mean(double(gather(res(end).x))))) ;
  reshape(params.errorFunction(params, labels, res),[],1) ; ]],2) ;

我的第一个问题是为什么res(end).x上面命令中的第三维是 1 而不是 64？这是56*56*1*100（100 是批次）。

我犯错了吗？

这是结果：

train: epoch 01:   2/ 40: 10.1 (27.0) Hz objective: 21360.722
train: epoch 01:   3/ 40: 13.0 (30.0) Hz objective: 67328685.873
...
train: epoch 01:  39/ 40: 29.7 (29.6) Hz objective: 5179175.587
train: epoch 01:  40/ 40: 29.8 (30.6) Hz objective: 5049697.440
val: epoch 01:   1/ 10: 87.3 (87.3) Hz objective: 49.512
val: epoch 01:   2/ 10: 88.9 (90.5) Hz objective: 50.012
...
val: epoch 01:   9/ 10: 88.2 (88.2) Hz objective: 49.936
val: epoch 01:  10/ 10: 88.1 (87.3) Hz objective: 49.962
train: epoch 02:   1/ 40: 30.2 (30.2) Hz objective: 49.650
train: epoch 02:   2/ 40: 30.3 (30.4) Hz objective: 49.704
...
train: epoch 02:  39/ 40: 30.2 (31.6) Hz objective: 49.739
train: epoch 02:  40/ 40: 30.3 (31.0) Hz objective: 49.722
val: epoch 02:   1/ 10: 91.8 (91.8) Hz objective: 49.687
val: epoch 02:   2/ 10: 92.0 (92.2) Hz objective: 49.831
...
val: epoch 02:   9/ 10: 92.0 (88.5) Hz objective: 49.931
val: epoch 02:  10/ 10: 91.9 (91.1) Hz objective: 49.962
train: epoch 03:   1/ 40: 31.7 (31.7) Hz objective: 49.014
train: epoch 03:   2/ 40: 31.2 (30.8) Hz objective: 49.237
...

这是我的网络架构图像

score 0 · Accepted Answer

的两个输入的大小如下所示，如上所述，输出的高度和宽度相同，但深度等于一pdist。关于错误定义的正确性，您应该准确地调试和检查大小和定义。 nxmx64x100pdist

matlab - 使用 matlab matconvnet 训练网络

1 回答 1

Related

Reference