neural-network - 弹性传播的实施

Question

目前我正在尝试为我的网络实施弹性传播。我是根据 encog 实现来做这件事的，但有一件事我不明白：

RPROP 和 iRPROP+的文档说当 change > 0: weightChange = -sign(gradient) * delta

因为我假设两者在某些情况下都是正确的：为什么两者之间有区别？

关于梯度：我在输出层中使用 tanh 作为活动。这是梯度的正确计算吗？

gradientOutput = (1 - lastOutput[j] * lastOutput[j]) * (target[j] - lastOutput[j]);

score 3 · Accepted Answer

After re-reading the relevant papers and looking up in a textbook I think the documentation of encog is not correct at this point. Why don't you just try it out by temporarily adding the minus-signs in the source code? If you use the same initial weights, you should receive exact the same results, given the documentation was correct. But in the end it just matters how you use the weightUpdate variable. If the author of the documentation is used to subtracting the weightUpdate from the weights instead of adding it, this will work.

Edit: I revisited the part about the gradient calculation in my original answer.

First, here is a brief explanation on how you can imagine the gradient for the weights in your output layer. First, you calculate the error between your outputs and the target values.

What you are now trying to do is to "blame" those neurons in the previous layer, which were active. Imagine the output neuron saying "Well, I have an error here, who is responsible?". Responsible are the neurons of the previous layer. Depending on the output being too small or too large compared to the target value, it will increase or decrease the weights to each of the neurons in the previous layers depending on how active they have been.

x is the activation of a neuron in the hidden layer.

o is the activation of the output neuron.

φ is the activation function of the output neuron, φ' its derivative.

Edit2: Corrected the part below. Added matrix style computation of backpropagation.

The error at each output neuron j is:

(1) δ_{out, j} = φ'(o_j)(t - o_j)

The gradient for the weight connecting the hidden neuron i with the output neuron j:

(2) grad_{i, j} = x_i * δ_{out, j}

The backpropagated error at each hidden neuron i with the weights w:

(3) δ_{hid, i} = φ'(x)*∑w_{i, j} * δ_{out, j}

By repeatedly applying formula 2 and 3, you can backpropagate up to the input layer.

Written in loops, regarding one training sample:

The error at each output neuron j is:

for(int j=0; j < numOutNeurons; j++) {
  errorOut[j] = activationDerivative(o[j])*(t[j] - o[j]);
}

The gradient for the weight connecting the hidden neuron i with the output neuron j:

for(int i=0; i < numHidNeurons; i++) {
  for(int j=0; j < numOutNeurons; j++) {
    grad[i][j] = x[i] * errorOut[j]        
  }      
}

The backpropagated error at each hidden neuron i:

for(int i=0; i < numHidNeurons; i++) {
  for(int j=0; j < numOutNeurons; j++) {
    errorHid[i] = activationDerivative(x[i]) * weights[i][j] * errorOut[j]        
  }      
}

In fully connected Multilayer Perceptrons without convolution or anything like that you can can use standard matrix operations, which is a lot faster.

Assuming each of your samples is a row in your input matrix and the columns are its attributes, you can propagate the input through your network like this:

activations[0] = input;
for(int i=0; i < numWeightMatrices; i++){
  activations[i+1] = activations[i].dot(weightMatrices[i]);
  activations[i+1] = activationFunction(activations[i+1]);
}

Backpropagation then becomes:

n = numWeightMatrices;
error = activationDerivative(activations[n]) * (target - activations[n]);
for (int l=n-1; l >= 0; l--){
  gradient[l] = activations[l].transposed().dot(error);
  if (l > 0) {
     error = error.dot(weightMatrices[l].transposed());
     error = activationDerivative(activations[l])*error;
  }
}

I omitted the bias neuron in the above explanations. In literature it is recommended to model the bias neuron as an additional column in each activation matrix which is alway 1.0 . You will need to deal with some slice assigns. When using the matrix backpropagation loop, do not forget to set the error at the position of the bias to 0 before each step!

score 0 · Accepted Answer

private float resilientPropagation(int i, int j){
    float gradientSignChange = sign(prevGradient[i][j]*gradient[i][j]);
    float delta = 0;
    if(gradientSignChange > 0){
        float change = Math.min((prevChange[i][j]*increaseFactor), maxDelta);
        delta = sign(gradient[i][j])*change;
        prevChange[i][j] = change;
        prevGradient[i][j] = gradient[i][j];
    }
    else if(gradientSignChange < 0){
        float change = Math.max((prevChange[i][j]*decreaseFactor), minDelta);
        prevChange[i][j] = change;
        delta = -prevDelta[i][j];
        prevGradient[i][j] = 0;
    }
    else if(gradientSignChange == 0){
        float change = prevChange[i][j];
        delta = sign(gradient[i][j])*change;
        prevGradient[i][j] = gradient[i][j];
    }
    prevDelta[i][j] = delta;
    return delta;       
}


gradient[i][j] = error[j]*layerInput[i];
weights[i][j]= weights[i][j]+resilientPropagation(i,j);

neural-network - 弹性传播的实施

2 回答 2

Related

Reference