I'm supposed to implement an artificial neural network (ANN) with 2 input, 2 hidden and 1 output neuron that can solve the XOR problem. The weights of the network should be optimized using an evolutionary algorithm. The activation function for each neuron and the fitness function for each ANN are given. The following picture sums up the problem and introduces the variable names I used:
Now I tried my very best to solve the problem, but even with an evolutionary algorithm using a population size of 1000 ANNs and 2000 generations my best fitness is never better than 0.75. My code includes a ANN class with the neurons, activation and fitness function and a Main class that includes the evolutionary algorithm and that optimizes the weights for the ANNs. Here is the code:
Each ANN is initialized with random weights between -1 and 1 and able to mutate, i.e. return a mutation that differs in one weight wich is chosen randomly.
public class ANN implements Comparable<ANN> {
private Random rand = new Random();
public double[] w = new double[6]; //weights: in1->h1, in1->h2, in2->h1, in2->h2, h1->out, h2->out
public ANN() {
for (int i=0; i<6; i++) //randomly initialize weights in [-1,1)
w[i] = rand.nextDouble() * 2 - 1;
}
//calculates the output for input a & b
public double ann(double a, double b) {
double h1 = activationFunc(a*w[0] + b*w[2]);
double h2 = activationFunc(a*w[1] + b*w[3]);
double out = activationFunc(h1*w[4] + h2*w[5]);
return out;
}
private double activationFunc(double x) {
return 2.0 / (1 + Math.exp(-2*x)) - 1;
}
//calculates the fitness (divergence to the right output)
public double fitness() {
double sum = 0;
//test all possible inputs (0,0; 0,1; 1,0; 1,1)
sum += 1 - Math.abs(0 - ann(0, 0));
sum += 1 - Math.abs(1 - ann(0, 1));
sum += 1 - Math.abs(1 - ann(1, 0));
sum += 1 - Math.abs(0 - ann(1, 1));
return sum / 4.0;
}
//randomly change random weight and return the mutated ANN
public ANN mutate() {
//copy weights
ANN mutation = new ANN();
for (int i=0; i<6; i++)
mutation.w[i] = w[i];
//randomly change one
int weight = rand.nextInt(6);
mutation.w[weight] = rand.nextDouble() * 2 - 1;
return mutation;
}
@Override
public int compareTo(ANN arg) {
if (this.fitness() < arg.fitness())
return -1;
if (this.fitness() == arg.fitness())
return 0;
return 1; //this.fitness > arg.fitness
}
@Override
public boolean equals(Object obj) {
if (obj == null)
return false;
ANN ann = (ANN)obj;
for (int i=0; i<w.length; i++) { //not equal if any weight is different
if (w[i] != ann.w[i])
return false;
}
return true;
}
}
The Main class has the evolutionary algorithm and uses elitism and rank-based selection to create the next generation of each population, i.e. the 100 best ANNs are copied the remaining 900 are mutations of previously successful ANNs.
//rank-based selection + elitism
public class Main {
static Random rand = new Random();
static int size = 1000; //population size
static int elitists = 100; //number of elitists
public static void main(String[] args) {
int generation = 0;
ArrayList<ANN> population = initPopulation();
print(population, generation);
//stop after good fitness is reached or after 2000 generations
while(bestFitness(population) < 0.8 && generation < 2000) {
generation++;
population = nextGeneration(population);
print(population, generation);
}
}
public static ArrayList<ANN> initPopulation() {
ArrayList<ANN> population = new ArrayList<ANN>();
for (int i=0; i<size; i++) {
ANN ann = new ANN();
if (!population.contains(ann)) //no duplicates
population.add(ann);
}
return population;
}
public static ArrayList<ANN> nextGeneration(ArrayList<ANN> current) {
ArrayList<ANN> next = new ArrayList<ANN>();
Collections.sort(current, Collections.reverseOrder()); //sort according to fitness (0=best, 999=worst)
//copy elitists
for (int i=0; i<elitists; i++) {
next.add(current.get(i));
}
//rank-based roulette wheel
while (next.size() < size) { //keep same population size
double total = 0;
for (int i=0; i<size; i++)
total += 1.0 / (i + 1.0); //fitness = 1/(rank+1)
double r = rand.nextDouble() * total;
double cap = 0;
for (int i=0; i<size; i++) {
cap += 1.0 / (i + 1.0); //higher rank => higher probability
if (r < cap) { //select for mutation
ANN mutation = current.get(i).mutate(); //no duplicates
if (!next.contains(mutation))
next.add(mutation);
break;
}
}
}
return next;
}
//returns best ANN in the specified population
public static ANN best(ArrayList<ANN> population) {
Collections.sort(population, Collections.reverseOrder());
return population.get(0);
}
//returns the best fitness of the specified population
public static double bestFitness(ArrayList<ANN> population) {
return best(population).fitness();
}
//returns the average fitness of the specified population
public static double averageFitness(ArrayList<ANN> population) {
double totalFitness = 0;
for (int i=0; i<size; i++)
totalFitness += population.get(i).fitness();
double average = totalFitness / size;
return average;
}
//print population best and average fitness
public static void print(ArrayList<ANN> population, int generation) {
System.out.println("Generation: " + generation + "\nBest: " + bestFitness(population) + ", average: " + averageFitness(population));
System.out.print("Best weights: ");
ANN best = best(population);
for (int i=0; i<best.w.length; i++)
System.out.print(best.w[i] + " ");
System.out.println();
System.out.println();
}
}
Even though, I put quite some thought into this and used techniques that I learned, the result is not satisfying. For some reason the optimal weights seem to drift to -1 for each weight. How does that make sense? Is the range of -1 to 1 for the weights a good choice? Should I also introduce crossovers in addition to the mutations? I know this is a very specific problem, but I would greatly appreciate some help!