1

我正在从头开始编写一个多层感知器,只有一个输入层、隐藏层和输出层。输出层将使用 softmax 激活函数来产生几个互斥输出的概率。

在我的隐藏层中,使用 softmax 激活函数对我来说也没有意义 - 这是正确的吗?如果是这样,我可以只使用任何其他非线性激活函数,例如 sigmoid 或 tanh 吗?或者我什至可以不在隐藏层中使用任何激活函数,而只是将隐藏节点的值保持为输入节点和输入到隐藏权重的线性组合?

4

2 回答 2

3

In my hidden layer it does not make sense to me to use the softmax activation function too - is this correct?

It is correct indeed.

If so can I just use any other non-linear activation function such as sigmoid or tanh?

You can, but most modern approaches would call for a Rectified Linear Unit (ReLU), or some of its variants (Leaky ReLU, ELU etc).

Or could I even not use any activation function in the hidden layer and just keep the values of the hidden nodes as the linear combinations of the input nodes and input-to-hidden weights?

No. The non-linear activations are indeed what prevents a (possibly large) neural network from behaving just like a single linear unit; it can be shown (see Andrew Ng's relevant lecture @ Coursera Why do you need non-linear activation functions?) that:

It turns out that if you use a linear activation function, or alternatively if you don't have an activation function, then no matter how many layers your neural network has, what is always doing is just computing a linear activation function, so you might as well not have any hidden layers.

The take-home is that a linear hidden layer is more or less useless because the composition of two linear functions is itself a linear function; so unless you throw a non-linearity in there then you're not computing more interesting functions even as you go deeper in the network.

Practically, the only place where you could use a linear activation function is the output layer for regression problems (explained also in the lecture linked above).

于 2018-04-20T13:08:03.967 回答
1

You can use any activation function. Just test some and go for the one yielding the best results. Don't forget to try Relu though. That is as far as I know the simplest which actually works very well.

于 2018-04-19T12:47:22.557 回答