c++ - 神经网络的softmax激活函数的实现

Question

我在神经网络的最后一层使用Softmax激活函数。但是我在安全实现此功能时遇到了问题。

一个天真的实现将是这个：

Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
  y(f) = exp(y(f));
y /= y.sum();

这对于 > 100 个隐藏节点来说效果不佳，因为NaN在许多情况下 y 将是（如果 y(f) > 709，exp(y(f)) 将返回 inf）。我想出了这个版本：

Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
  y(f) = safeExp(y(f), y.rows());
y /= y.sum();

其中safeExp定义为

double safeExp(double x, int div)
{
  static const double maxX = std::log(std::numeric_limits<double>::max());
  const double max = maxX / (double) div;
  if(x > max)
    x = max;
  return std::exp(x);
}

该函数限制 exp 的输入。在大多数情况下，这有效，但并非在所有情况下都有效，我并没有真正设法找出在哪些情况下它不起作用。当我在前一层有 800 个隐藏神经元时，它根本不起作用。

然而，即使这有效，我还是以某种方式“扭曲”了 ANN 的结果。你能想到任何其他方法来计算正确的解决方案吗？是否有任何 C++ 库或技巧可用于计算此 ANN 的确切输出？

编辑： Itamar Katz 提供的解决方案是：

Vector y = mlp(x); // output of the neural network without softmax activation function
double ymax = maximal component of y
for(int f = 0; f < y.rows(); f++)
  y(f) = exp(y(f) - ymax);
y /= y.sum();

它在数学上确实是一样的。然而，在实践中，由于浮点精度，一些小值变为 0。我想知道为什么没有人在教科书中写下这些实现细节。

score 14 · Accepted Answer

首先去对数比例，即计算log(y)而不是y。分子的对数是微不足道的。为了计算分母的对数，您可以使用以下“技巧”： http: //lingpipe-blog.com/2009/06/25/log-sum-of-exponentials/

score 8 · Accepted Answer

I know it's already answered but I'll post here a step-by-step anyway.

put on log:

zj = wj . x + bj
oj = exp(zj)/sum_i{ exp(zi) }
log oj = zj - log sum_i{ exp(zi) }

Let m be the max_i { zi } use the log-sum-exp trick:

log oj = zj - log {sum_i { exp(zi + m - m)}}
   = zj - log {sum_i { exp(m) exp(zi - m) }},
   = zj - log {exp(m) sum_i {exp(zi - m)}}
   = zj - m - log {sum_i { exp(zi - m)}}

the term exp(zi-m) can suffer underflow if m is much greater than other z_i, but that's ok since this means z_i is irrelevant on the softmax output after normalization. final results is:

oj = exp (zj - m - log{sum_i{exp(zi-m)}})

c++ - 神经网络的softmax激活函数的实现

2 回答 2

Related

Reference