machine-learning - 我的代码在张量流中使用批量标准化层是否正确？

Question

我有两个输入：qi_pos & qi_neg具有相同的形状。它们应该由两个 mlp 层处理，最后得到两个结果作为分数。这是我的代码：

  self.mlp1_pos  =    nn_layers.full_connect_(qi_pos,        256, activation='relu', use_bn = None, keep_prob=self.keep_prob,  name = 'deep_mlp_1')
  self.mlp2_pos  =    nn_layers.full_connect_(self.mlp1_pos, 128,  activation='relu', use_bn = True, keep_prob=self.keep_prob,  name = 'deep_mlp_2')
  self.pos_pair_sim = nn_layers.full_connect_(self.mlp2_pos,  1,  activation=None, use_bn = True, keep_prob=self.keep_prob,  name = 'deep_mlp_3')
  tf.get_variable_scope().reuse_variables()
  self.mlp1_neg  =    nn_layers.full_connect_(qi_neg,        256, activation='relu', use_bn = None, keep_prob=self.keep_prob,  name = 'deep_mlp_1')
  self.mlp2_neg  =    nn_layers.full_connect_(self.mlp1_neg, 128,  activation='relu', use_bn = True, keep_prob=self.keep_prob,  name = 'deep_mlp_2')
  self.neg_pair_sim = nn_layers.full_connect_(self.mlp2_neg,  1,  activation=None, use_bn = True, keep_prob=self.keep_prob,  name = 'deep_mlp_3')

我使用 BN 层对隐藏层中的节点进行归一化：

def full_connect_(inputs, num_units, activation=None, use_bn = None, keep_prob = 1.0, name='full_connect_'):
  with tf.variable_scope(name):
    shape = [inputs.get_shape()[-1], num_units]
    weight = weight_variable(shape)
    bias = bias_variable(shape[-1])
    outputs_ = tf.matmul(inputs, weight) + bias
    if use_bn:
        outputs_ = tf.contrib.layers.batch_norm(outputs_, center=True, scale=True, is_training=True,decay=0.9,epsilon=1e-5, scope='bn')
    if activation=="relu":
      outputs = tf.nn.relu(outputs_)
    elif activation == "tanh":
      outputs = tf.tanh(outputs_)
    elif activation == "sigmoid":
      outputs = tf.nn.sigmoid(outputs_)
    else:
      outputs = outputs_
    return  outputs

   with tf.name_scope('predictions'):
      self.sim_diff = self.pos_pair_sim - self.neg_pair_sim # shape = (batch_size, 1)
      self.preds = tf.sigmoid(self.sim_diff) # shape = (batch_size, 1)
      self.infers = self.pos_pair_sim

下面是损失定义。看起来没问题。

with tf.name_scope('predictions'):
  sim_diff = pos_pair_sim - neg_pair_sim
  predictions = tf.sigmoid(sim_diff)
  self.infers = pos_pair_sim
## loss and optim
with tf.name_scope('loss'):
  self.loss = nn_layers.cross_entropy_loss_with_reg(self.labels, self.preds)
  tf.summary.scalar('loss', self.loss)

我不确定我是否以正确的方式使用了 BN 层。我的意思是 BN 参数来自两个独立部分的隐藏单元，它们基于qi_pos和qi_neg张量作为输入。无论如何，任何人都可以帮助检查它吗？

score 0 · Accepted Answer

您的代码对我来说似乎很好，在网络的不同分支中应用 BN 没有问题。但我想在这里提几点注意事项：

BN 超参数非常标准，所以我通常不会手动设置decay,epsilon和renorm_decay. 这并不意味着您不能更改它们，在大多数情况下根本没有必要。
您在激活函数之前应用 BN ，但是，有证据表明如果在激活之后应用它会更好。例如，参见这个讨论。再一次，这并不意味着它是一个错误，只是需要考虑更多的架构。

machine-learning - 我的代码在张量流中使用批量标准化层是否正确？

1 回答 1

Related

Reference