python - 如何在 Python 中实现 Softmax 函数

Question

从Udacity 的深度学习类中，y_i 的 softmax 就是简单的指数除以整个 Y 向量的指数之和：

哪里S(y_i)是softmax函数y_i并且e是指数并且j是no。输入向量 Y 中的列数。

我尝试了以下方法：

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

返回：

[ 0.8360188   0.11314284  0.05083836]

但建议的解决方案是：

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

它产生与第一个实现相同的输出，即使第一个实现显式地获取每列和最大值的差异，然后除以总和。

有人可以在数学上显示为什么吗？一个是对的，一个是错的吗？

实现在代码和时间复杂度方面是否相似？哪个更有效率？

score 165 · Accepted Answer

它们都是正确的，但从数值稳定性的角度来看，您更喜欢您的。

你从

e ^ (x - max(x)) / sum(e^(x - max(x))

通过使用 a^(b - c) = (a^b)/(a^c) 的事实，我们有

= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))

= e ^ x / sum(e ^ x)

这是另一个答案所说的。您可以将 max(x) 替换为任何变量，它会取消。

score 133 · Accepted Answer

（嗯......这里有很多混乱，无论是在问题还是在答案中......）

首先，这两种解决方案（即您的解决方案和建议的解决方案）是不等价的；它们恰好仅对一维分数数组的特殊情况是等价的。如果您也尝试过 Udacity 测验提供的示例中的 2-D 分数数组，您就会发现它。

结果方面，两种解决方案之间唯一的实际区别是axis=0论点。要看到这种情况，让我们尝试一下您的解决方案 ( your_softmax)，其中唯一的区别是axis参数：

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# correct solution:
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

正如我所说，对于一维分数数组，结果确实是相同的：

scores = [3.0, 1.0, 0.2]
print(your_softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
print(softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
your_softmax(scores) == softmax(scores)
# array([ True,  True,  True], dtype=bool)

尽管如此，以下是 Udacity 测验中给出的二维分数数组的结果，作为测试示例：

scores2D = np.array([[1, 2, 3, 6],
                     [2, 4, 5, 6],
                     [3, 8, 7, 6]])

print(your_softmax(scores2D))
# [[  4.89907947e-04   1.33170787e-03   3.61995731e-03   7.27087861e-02]
#  [  1.33170787e-03   9.84006416e-03   2.67480676e-02   7.27087861e-02]
#  [  3.61995731e-03   5.37249300e-01   1.97642972e-01   7.27087861e-02]]

print(softmax(scores2D))
# [[ 0.09003057  0.00242826  0.01587624  0.33333333]
#  [ 0.24472847  0.01794253  0.11731043  0.33333333]
#  [ 0.66524096  0.97962921  0.86681333  0.33333333]]

结果不同 - 第二个确实与 Udacity 测验中预期的结果相同，其中所有列的总和确实为 1，而第一个（错误）结果并非如此。

所以，所有的大惊小怪实际上都是为了一个实现细节——axis论点。根据numpy.sum 文档：

默认值，axis=None，将对输入数组的所有元素求和

而在这里我们想要逐行求和，因此axis=0. 对于一维数组，（唯一）行的总和和所有元素的总和恰好相同，因此在这种情况下您的结果相同......

除了这个axis问题，您的实现（即您选择先减去最大值）实际上比建议的解决方案更好！事实上，这是实现 softmax 函数的推荐方式 - 请参见此处了解理由（数值稳定性，此处的其他一些答案也指出了这一点）。

score 65 · Accepted Answer

因此，这确实是对沙漠航海者答案的评论，但由于我的声誉，我还不能对此发表评论。正如他所指出的，只有当您的输入包含单个样本时，您的版本才是正确的。如果您的输入由多个样本组成，那是错误的。但是，desertnaut 的解决方案也是错误的。问题是，一旦他接受一维输入，然后他接受二维输入。让我把这个给你看。

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# desertnaut solution (copied from his answer): 
def desertnaut_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

# my (correct) solution:
def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

让我们以desertnauts为例：

x1 = np.array([[1, 2, 3, 6]]) # notice that we put the data into 2 dimensions(!)

这是输出：

your_softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

desertnaut_softmax(x1)
array([[ 1.,  1.,  1.,  1.]])

softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

您可以看到在这种情况下，desernauts 版本会失败。（如果输入只是像 np.array([1, 2, 3, 6]) 这样的一维，则不会。

现在让我们使用 3 个样本，因为这就是我们使用二维输入的原因。以下 x2 与 desernauts 示例中的 x2 不同。

x2 = np.array([[1, 2, 3, 6],  # sample 1
               [2, 4, 5, 6],  # sample 2
               [1, 2, 3, 6]]) # sample 1 again(!)

此输入由具有 3 个样本的批次组成。但样品一和样品三基本相同。我们现在期望 3 行 softmax 激活，其中第一行应该与第三行相同，也与我们对 x1 的激活相同！

your_softmax(x2)
array([[ 0.00183535,  0.00498899,  0.01356148,  0.27238963],
       [ 0.00498899,  0.03686393,  0.10020655,  0.27238963],
       [ 0.00183535,  0.00498899,  0.01356148,  0.27238963]])


desertnaut_softmax(x2)
array([[ 0.21194156,  0.10650698,  0.10650698,  0.33333333],
       [ 0.57611688,  0.78698604,  0.78698604,  0.33333333],
       [ 0.21194156,  0.10650698,  0.10650698,  0.33333333]])

softmax(x2)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047],
       [ 0.01203764,  0.08894682,  0.24178252,  0.65723302],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

我希望你能看到这只是我的解决方案的情况。

softmax(x1) == softmax(x2)[0]
array([[ True,  True,  True,  True]], dtype=bool)

softmax(x1) == softmax(x2)[2]
array([[ True,  True,  True,  True]], dtype=bool)

此外，这里是 TensorFlow 的 softmax 实现的结果：

import tensorflow as tf
import numpy as np
batch = np.asarray([[1,2,3,6],[2,4,5,6],[1,2,3,6]])
x = tf.placeholder(tf.float32, shape=[None, 4])
y = tf.nn.softmax(x)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(y, feed_dict={x: batch})

结果：

array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037045],
       [ 0.01203764,  0.08894681,  0.24178252,  0.657233  ],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037045]], dtype=float32)

score 41 · Accepted Answer

我想说，虽然两者在数学上都是正确的，但在实现方面，第一个更好。在计算 softmax 时，中间值可能会变得非常大。将两个大数相除在数值上可能是不稳定的。这些笔记（来自斯坦福）提到了一个归一化技巧，这本质上就是你正在做的事情。

score 29 · Accepted Answer

sklearn 还提供了 softmax 的实现

from sklearn.utils.extmath import softmax
import numpy as np

x = np.array([[ 0.50839931,  0.49767588,  0.51260159]])
softmax(x)

# output
array([[ 0.3340521 ,  0.33048906,  0.33545884]])

score 17 · Accepted Answer

从数学的角度来看，双方是平等的。

你可以很容易地证明这一点。让我们m=max(x)。现在您的函数softmax返回一个向量，其第 i 个坐标等于

请注意，这适用于 any m，因为对于所有（甚至复数）数字e^m != 0

从计算复杂度的角度来看，它们也是等价的，并且都可以O(n)及时运行，其中n是向量的大小。
从数值稳定性的角度来看，第一个解决方案是首选，因为e^x增长非常快，即使是非常小的值x也会溢出。减去最大值可以消除这种溢出。要实际体验我所说的内容，请尝试x = np.array([1000, 5])将这两个功能都输入到您的功能中。一个会返回正确的概率，第二个会溢出nan
您的解决方案仅适用于向量（Udacity 测验希望您也为矩阵计算它）。为了修复它，您需要使用sum(axis=0)

score 11 · Accepted Answer

编辑。从 1.2.0 版本开始，scipy 包含 softmax 作为特殊功能：

https://scipy.github.io/devdocs/generated/scipy.special.softmax.html

我写了一个在任何轴上应用softmax的函数：

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats. 
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the 
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter, 
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p

正如其他用户所描述的那样，减去最大值是一种很好的做法。我在这里写了一篇关于它的详细帖子。

score 10 · Accepted Answer

在这里您可以了解他们使用- max.

从那里：

“当您在实践中编写用于计算 Softmax 函数的代码时，由于指数，中间项可能非常大。除以大数可能在数值上不稳定，因此使用归一化技巧很重要。”

score 4 · Accepted Answer

要提供替代解决方案，请考虑您的论点在幅度上非常大以至于exp(x)会下溢（在否定情况下）或溢出（在肯定情况下）的情况。在这里，您希望尽可能长时间地保留在日志空间中，仅在您可以相信结果会表现良好的末尾取幂。

import scipy.special as sc
import numpy as np

def softmax(x: np.ndarray) -> np.ndarray:
    return np.exp(x - sc.logsumexp(x))

score 4 · Accepted Answer

4

更简洁的版本是：

def softmax(x):
    return np.exp(x) / np.exp(x).sum(axis=0)

于 2016-09-06T20:08:40.650 回答

score 4 · Accepted Answer

我很想知道这些之间的性能差异

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

def softmaxv2(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

def softmaxv3(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / np.sum(e_x, axis=0)

def softmaxv4(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x - np.max(x)) / np.sum(np.exp(x - np.max(x)), axis=0)



x=[10,10,18,9,15,3,1,2,1,10,10,10,8,15]

使用

print("----- softmax")
%timeit  a=softmax(x)
print("----- softmaxv2")
%timeit  a=softmaxv2(x)
print("----- softmaxv3")
%timeit  a=softmaxv2(x)
print("----- softmaxv4")
%timeit  a=softmaxv2(x)

增加 x (+100 +200 +500...) 内部的值，我使用原始 numpy 版本始终获得更好的结果（这里只是一个测试）

----- softmax
The slowest run took 8.07 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 17.8 µs per loop
----- softmaxv2
The slowest run took 4.30 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23 µs per loop
----- softmaxv3
The slowest run took 4.06 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23 µs per loop
----- softmaxv4
10000 loops, best of 3: 23 µs per loop

直到.... x 内的值达到〜800，然后我得到

----- softmax
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4: RuntimeWarning: overflow encountered in exp
  after removing the cwd from sys.path.
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in true_divide
  after removing the cwd from sys.path.
The slowest run took 18.41 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23.6 µs per loop
----- softmaxv2
The slowest run took 4.18 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 22.8 µs per loop
----- softmaxv3
The slowest run took 19.44 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23.6 µs per loop
----- softmaxv4
The slowest run took 16.82 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 22.7 µs per loop

正如某些人所说，您的版本“对于大量数字”在数值上更加稳定。对于少量可能是相反的方式。

score 3 · Accepted Answer

我建议这样做：

def softmax(z):
    z_norm=np.exp(z-np.max(z,axis=0,keepdims=True))
    return(np.divide(z_norm,np.sum(z_norm,axis=0,keepdims=True)))

它适用于随机和批次。
有关更多详细信息，请参阅： https ://medium.com/@ravish1729/analysis-of-softmax-function-ad058d6a564d

score 3 · Accepted Answer

我需要与Tensorflow的密集层输出兼容的东西。

@desertnaut的解决方案在这种情况下不起作用，因为我有批量数据。因此，我提出了另一种适用于两种情况的解决方案：

def softmax(x, axis=-1):
    e_x = np.exp(x - np.max(x)) # same code
    return e_x / e_x.sum(axis=axis, keepdims=True)

结果：

logits = np.asarray([
    [-0.0052024,  -0.00770216,  0.01360943, -0.008921], # 1
    [-0.0052024,  -0.00770216,  0.01360943, -0.008921]  # 2
])

print(softmax(logits))

#[[0.2492037  0.24858153 0.25393605 0.24827873]
# [0.2492037  0.24858153 0.25393605 0.24827873]]

参考：Tensorflow softmax

score 1 · Accepted Answer

已经在上面的答案中详细回答了。max被减去以避免溢出。我在这里在 python3 中添加了另一个实现。

import numpy as np
def softmax(x):
    mx = np.amax(x,axis=1,keepdims = True)
    x_exp = np.exp(x - mx)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    res = x_exp / x_sum
    return res

x = np.array([[3,2,4],[4,5,6]])
print(softmax(x))

score 1 · Accepted Answer

为了保持数值稳定性，应减去 max(x)。以下是softmax函数的代码；

定义软最大（x）：

if len(x.shape) > 1:
    tmp = np.max(x, axis = 1)
    x -= tmp.reshape((x.shape[0], 1))
    x = np.exp(x)
    tmp = np.sum(x, axis = 1)
    x /= tmp.reshape((x.shape[0], 1))
else:
    tmp = np.max(x)
    x -= tmp
    x = np.exp(x)
    tmp = np.sum(x)
    x /= tmp


return x

score 1 · Accepted Answer

根据所有回复和CS231n 注释，请允许我总结一下：

def softmax(x, axis):
    x -= np.max(x, axis=axis, keepdims=True)
    return np.exp(x) / np.exp(x).sum(axis=axis, keepdims=True)

用法：

x = np.array([[1, 0, 2,-1],
              [2, 4, 6, 8], 
              [3, 2, 1, 0]])
softmax(x, axis=1).round(2)

输出：

array([[0.24, 0.09, 0.64, 0.03],
       [0.  , 0.02, 0.12, 0.86],
       [0.64, 0.24, 0.09, 0.03]])

score 1 · Accepted Answer

每个人似乎都发布了他们的解决方案，所以我会发布我的：

def softmax(x):
    e_x = np.exp(x.T - np.max(x, axis = -1))
    return (e_x / e_x.sum(axis=0)).T

我得到与从 sklearn 导入的完全相同的结果：

from sklearn.utils.extmath import softmax

score 1 · Accepted Answer

import tensorflow as tf
import numpy as np

def softmax(x):
    return (np.exp(x).T / np.exp(x).sum(axis=-1)).T

logits = np.array([[1, 2, 3], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])

sess = tf.Session()
print(softmax(logits))
print(sess.run(tf.nn.softmax(logits)))
sess.close()

score 0 · Accepted Answer

softmax 函数是一种激活函数，可将数字转换为总和为 1 的概率。softmax 函数输出一个向量，该向量表示结果列表的概率分布。它也是深度学习分类任务中使用的核心元素。

当我们有多个类时使用 Softmax 函数。

这对于找出具有最大值的类很有用。可能性。

Softmax 函数理想地用于输出层，我们实际上是在尝试获得定义每个输入类的概率。

它的范围从 0 到 1。

Softmax 函数将 logits [2.0, 1.0, 0.1] 转换为概率 [0.7, 0.2, 0.1]，概率总和为 1。 Logits 是神经网络最后一层输出的原始分数。在激活发生之前。要理解 softmax 函数，我们必须查看第 (n-1) 层的输出。

softmax 函数实际上是一个 arg max 函数。这意味着它不会从输入中返回最大值，而是返回最大值的位置。

例如：

在softmax之前

X = [13, 31, 5]

在softmax之后

array([1.52299795e-08, 9.99999985e-01, 5.10908895e-12]

代码：

import numpy as np

# your solution:

def your_softmax(x): 

"""Compute softmax values for each sets of scores in x.""" 

e_x = np.exp(x - np.max(x)) 

return e_x / e_x.sum() 

# correct solution: 

def softmax(x): 

"""Compute softmax values for each sets of scores in x.""" 

e_x = np.exp(x - np.max(x)) 

return e_x / e_x.sum(axis=0) 

# only difference

score 0 · Accepted Answer

这是使用 numpy 和比较与 tensorflow ans scipy 的正确性的通用解决方案：

资料准备：

import numpy as np

np.random.seed(2019)

batch_size = 1
n_items = 3
n_classes = 2
logits_np = np.random.rand(batch_size,n_items,n_classes).astype(np.float32)
print('logits_np.shape', logits_np.shape)
print('logits_np:')
print(logits_np)

输出：

logits_np.shape (1, 3, 2)
logits_np:
[[[0.9034822  0.3930805 ]
  [0.62397    0.6378774 ]
  [0.88049906 0.299172  ]]]

使用张量流的 Softmax：

import tensorflow as tf

logits_tf = tf.convert_to_tensor(logits_np, np.float32)
scores_tf = tf.nn.softmax(logits_np, axis=-1)

print('logits_tf.shape', logits_tf.shape)
print('scores_tf.shape', scores_tf.shape)

with tf.Session() as sess:
    scores_np = sess.run(scores_tf)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np,axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

logits_tf.shape (1, 3, 2)
scores_tf.shape (1, 3, 2)
scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

使用 scipy 的 Softmax：

from scipy.special import softmax

scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.6413727  0.35862732]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

Softmax 使用 numpy ( https://nolanbconaway.github.io/blog/2017/softmax-numpy )：

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats.
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter,
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p


scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.49652317 0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

score 0 · Accepted Answer

softmax 函数的目的是保持向量的比率，而不是在值饱和时用 sigmoid 压缩端点（即趋于 +/- 1（tanh）或从 0 到 1（逻辑））。这是因为它保留了更多关于端点变化率的信息，因此更适用于具有 1-of-N 输出编码的神经网络（即，如果我们压缩端点，则更难区分 1 -of-N 输出类，因为我们无法分辨哪个是“最大”或“最小”，因为它们被压扁了。）；它也使总输出总和为 1，明确的赢家将更接近 1，而其他彼此接近的数字的总和为 1/p，其中 p 是具有相似值的输出神经元的数量。

从向量中减去最大值的目的是，当您执行 e^y 指数时，您可能会得到非常高的值，该值会将浮点数限制在导致平局的最大值处，但在本示例中并非如此。如果你减去最大值得到一个负数，这将成为一个大问题，那么你有一个负指数会迅速缩小改变比率的值，这就是海报问题中发生的情况并产生了错误的答案。

Udacity 提供的答案非常低效。我们需要做的第一件事是计算所有向量分量的 e^y_j，保持这些值，然后将它们相加，然后除。Udacity 搞砸的地方是他们计算 e^y_j TWICE！以下是正确答案：

def softmax(y):
    e_to_the_y_j = np.exp(y)
    return e_to_the_y_j / np.sum(e_to_the_y_j, axis=0)

score 0 · Accepted Answer

这概括并假设您正在规范化尾随维度。

def softmax(x: np.ndarray) -> np.ndarray:
    e_x = np.exp(x - np.max(x, axis=-1)[..., None])
    e_y = e_x.sum(axis=-1)[..., None]
    return e_x / e_y

score 0 · Accepted Answer

这也适用于 np.reshape。

   def softmax( scores):
        """
        Compute softmax scores given the raw output from the model

        :param scores: raw scores from the model (N, num_classes)
        :return:
            prob: softmax probabilities (N, num_classes)
        """
        prob = None

        exponential = np.exp(
            scores - np.max(scores, axis=1).reshape(-1, 1)
        )  # subract the largest number https://jamesmccaffrey.wordpress.com/2016/03/04/the-max-trick-when-computing-softmax/
        prob = exponential / exponential.sum(axis=1).reshape(-1, 1)

        

        return prob

score 0 · Accepted Answer

目标是使用 Numpy 和 Tensorflow 获得类似的结果。与原始答案的唯一变化是api的axis参数。np.sum

初始方法：axis=0- 但是，当维度为 N 时，这并不能提供预期的结果。

修改的方法：axis=len(e_x.shape)-1- 总是在最后一个维度上求和。这提供了与 tensorflow 的 softmax 函数类似的结果。

def softmax_fn(input_array):
    """
    | **@author**: Prathyush SP
    |
    | Calculate Softmax for a given array
    :param input_array: Input Array
    :return: Softmax Score
    """
    e_x = np.exp(input_array - np.max(input_array))
    return e_x / e_x.sum(axis=len(e_x.shape)-1)

score 0 · Accepted Answer

我想补充一点对这个问题的理解。这里减去数组的最大值是正确的。但是，如果您在另一篇文章中运行代码，您会发现当数组是 2D 或更高维度时，它并没有给您正确的答案。

这里我给你一些建议：

要获得最大值，请尝试沿 x 轴进行，您将获得一个一维数组。
将您的最大数组重塑为原始形状。
np.exp 得到指数值。
沿轴执行 np.sum。
得到最终结果。

按照结果进行矢量化，您将得到正确的答案。由于和大学作业有关，这里不能贴出具体的代码，不明白的地方还望多多指教。

python - 如何在 Python 中实现 Softmax 函数

25 回答 25

Related

Reference