machine-learning - 二元交叉熵惩罚 one-hot 向量的所有分量

Question

我知道在两个类的情况下，二元交叉熵与分类交叉熵相同。

此外，我很清楚 softmax 是什么。
因此，我看到分类交叉熵只会惩罚应该为 1 的一个分量（概率）。

但是为什么，不能或不应该在单热向量上使用二元交叉熵？

Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred            = [0.1 0.3 0.2 0.4]
label (one hot) = [0   1   0   0]
costfunction: categorical crossentropy 
                            = sum(label * -log(pred)) //just consider the 1-label
                            = 0.523
Why not that?
################
pred            = [0.1 0.3 0.2 0.4]
label (one hot) = [0   1   0   0]
costfunction: binary crossentropy
                            = sum(- label * log(pred) - (1 - label) * log(1 - pred))
                            = 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
                            = 0.887

我看到在二进制交叉熵中，零是一个目标类，并且对应于以下 one-hot 编码：

target class zero 0 -> [1 0]
target class one  1 -> [0 1]

总结：为什么我们只计算/总结预测类的负对数似然性。为什么我们不惩罚其他应该是零/不那么类的课程？

万一使用二元交叉熵来处理单热向量。预期零标签的概率也会受到惩罚。

score 5 · Accepted Answer

See my answer on a similar question. In short, binary cross-entropy formula doesn't make sense for the one-hot vector. It's either possible to apply softmax cross-entropy for two or more classes or use the vector of (independent) probabilities in label, depending on the task.

But why, can't or shouldn't I use binary crossentropy on a one-hot vector?

What you compute is binary cross-entropy of 4 independent features:

pred   = [0.1 0.3 0.2 0.4]
label  = [0   1   0   0]

The model inference predicted that first feature is on with 10% probability, the second feature is on with 30% probability and so on. Target label is interpreted this way: all features are off, except for the second one. Note that [1, 1, 1, 1] is a perfectly valid label as well, i.e. it's not one-hot vector, and pred=[0.5, 0.8, 0.7, 0.1] is a valid prediction, i.e. the sum doesn't have to equal to one.

In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.

See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.

machine-learning - 二元交叉熵惩罚 one-hot 向量的所有分量

1 回答 1

Related

Reference