machine-learning - 不可训练权重函数的梯度

Question

我正在尝试实现一个自写的损失函数。我的管道如下

x -> {常量计算} = x_feature ->机器学习训练-> y_feature -> {常量计算} = y_produced

这些“恒定计算”对于找出所需的 o/p 和产生的 o/p 之间的差异是必要的。

所以如果我取y_produced和y_original的L2范数，我应该如何将这个损失纳入原始损失中。

请注意与y_produced具有不同的维度y_feature。

score 0 · Accepted Answer

As long as you are using differentiable operations there is no difference between "constant transformations" and "learnable ones". There is no such distinction, look even at the linear layer of a neural net

f(x) = sigmoid( W * x + b )

is it constant or learnable? W and b are trained, but "sigmoid" is not, yet gradient flows the same way, no matter if something is a variable or not. In particular gradient wrt. to x is the same for

g(x) = sigmoid( A * x + c )

where A and c are constants.

The only problem you will encounter is using non-differentiable operations, such as: argmax, sorting, indexing, sampling etc. these operations do not have a well defined gradient thus you cannot directly use first order optimisers with them. As long as you stick with the differentiable ones - the problem described does not really exist - there is no difference between "constant transromations" and any other transformations - no matter change of the size etc.

machine-learning - 不可训练权重函数的梯度

1 回答 1

Related

Reference