I've been working on a Q reinforcement learning implementation, where Q(π, a) is is approximated with a neural network. During trouble-shooting, I reduced the problem down to a very simple first step: train a NN to calculate atan2(y, x).
I'm using FANN for this problem, but the library is largely irrelevant as this question is more about the appropriate technique to use.
I have been struggling to teach the NN, given input = {x, y}, to calculate output = atan2(y, x).
Here is the naïve approach I have been using. It's extremely simplistic, but I'm trying to keep this simple to work up.
#include "fann.h"
#include <cstdio>
#include <random>
#include <cmath>
int main()
{
// creates a 3 layered, densely connected neural network, 2-3-1
fann *ann = fann_create_standard(3, 2, 3, 1);
// set the activation functions for the layers
fann_set_activation_function_hidden(ann, FANN_SIGMOID_SYMMETRIC);
fann_set_activation_function_output(ann, FANN_SIGMOID_SYMMETRIC);
fann_type input[2];
fann_type expOut[1];
fann_type *calcOut;
std::default_random_engine rng;
std::uniform_real_distribution<double> unif(0.0, 1.0);
for (int i = 0; i < 100000000; ++i) {
input[0] = unif(rng);
input[1] = unif(rng);
expOut[0] = atan2(input[1], input[0]);
// does a single incremental training round
fann_train(ann, input, expOut);
}
input[0] = unif(rng);
input[1] = unif(rng);
expOut[0] = atan2(input[1], input[0]);
calcOut = fann_run(ann, input);
printf("Testing atan2(%f, %f) = %f -> %f\n", input[1], input[0], expOut[0], calcOut[0]);
fann_destroy(ann);
return 0;
}
Super simple, right? However, even after 100,000,000 iterations this neural network fails:
Testing atan2(0.949040, 0.756997) = 0.897493 -> 0.987712
I also tried using a linear activation function on the output layer (FANN_LINEAR
). No luck. In fact, the results are much worse. After 100,000,000 iterations, we get:
Testing atan2(0.949040, 0.756997) = 0.897493 -> 7.648625
Which is even worse than when the weights were randomly initialized. How could a NN get worse after training?
I found this issue with FANN_LINEAR
to be consistent with other tests. When linear output is needed (e.g. in the calculation of the Q value, which corresponds to arbitrarily large or small rewards), this approach fails miserably and error actually appears to increase with training.
So what is going on? Is using a fully-connected 2-3-1 NN inappropriate for this situation? Is a symmetric sigmoid activation function in the hidden layer inappropriate? I fail to see what else could possibly account for this error.