machine-learning - caffe中应该如何使用“BatchNorm”层？

Question

我对如何"BatchNorm"在模型中使用/插入层有点困惑。
我看到了几种不同的方法，例如：

ResNets：`"BatchNorm"`+ `"Scale"`（无参数共享）

"BatchNorm"layer 紧随其后的是"Scale"layer：

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "bn2a_branch1"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
}

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "scale2a_branch1"
    type: "Scale"
    scale_param {
        bias_term: true
    }
}

cifar10 示例：仅`"BatchNorm"`

在 caffe 提供的 cifar10 示例中，"BatchNorm"使用时没有任何"Scale"后续：

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}

cifar10和_ `batch_norm_param`_`TRAINTEST`

batch_norm_param: use_global_scaleTRAIN在和TEST阶段之间改变：

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}

那么它应该是什么？

"BatchNorm"应该如何在 caffe 中使用层？

score 6 · Accepted Answer

如果您遵循原始论文，批量归一化之后应该是 Scale 和 Bias 层（可以通过 Scale 包含偏差，尽管这使得 Bias 参数无法访问）。use_global_stats还应该从训练 (False) 更改为测试/部署 (True) - 这是默认行为。请注意，您给出的第一个示例是用于部署的 prototxt，因此将其设置为 True 是正确的。

我不确定共享参数。

我提出了一个拉取请求来改进批处理规范化的文档，但后来因为我想修改它而关闭了它。然后，我再也没有回过头来。

请注意，我认为lr_mult: 0for"BatchNorm"不再需要（也许不允许？），尽管我现在没有找到相应的 PR。

score 2 · Accepted Answer

After each BatchNorm, we have to add a Scale layer in Caffe. The reason is that the Caffe BatchNorm layer only subtracts the mean from the input data and divides by their variance, while does not include the γ and β parameters that respectively scale and shift the normalized distribution 1. Conversely, the Keras BatchNormalization layer includes and applies all of the parameters mentioned above. Using a Scale layer with the parameter “bias_term” set to True in Caffe, provides a safe trick to reproduce the exact behavior of the Keras version. https://www.deepvisionconsulting.com/from-keras-to-caffe/

machine-learning - caffe中应该如何使用“BatchNorm”层？

ResNets："BatchNorm"+ "Scale"（无参数共享）

cifar10 示例：仅"BatchNorm"

cifar10和_ batch_norm_param_TRAINTEST

那么它应该是什么？

2 回答 2

Related

Reference

ResNets：`"BatchNorm"`+ `"Scale"`（无参数共享）

cifar10 示例：仅`"BatchNorm"`

cifar10和_ `batch_norm_param`_`TRAINTEST`