I'm just starting to learn neural networks, to see if they could be useful for me.
I downloaded this simple python code of 3 layer feed forward neural network
and I just modified the learning pattern to checkerboard instead of XOR and number of nodes in hidden layer to 10. If I understand universal approximation theorem this 3 layer network (one hidden layer) should be able to learn any function from R2 to R including my checkerboard function. ... But it does not.
What is wrong?
- I understand universal approximation theorem wrong - maybe the function should be monotoneous or convex? (The area should be linearly separable? ).
- I need an other layer (2 hidden layers) to approximate such non-convec, non linearly separable function?
- The network is just traped in some local minimum? (but I don't think so? I tried several runs, initial weight are random, but the result is the same)
- 10 nodes in hiden layer is not enought? I tried different number - with 5 it is almost the same. With 30 it does not
Is there any general approach how to modify the optimization (learning) scheme to be sure to converge to any function which can be theoretically described by the given network according to universal approximation theorem?
Is there any general test, which will tell me, If my network (with given topology, number of layers and nodes) is able to describe given function, if it is just trapped in some local minimum?
This are the results with 10 neurons in hidden layer:
train it with some patterns
error 3.14902
error 1.37104
error 1.35305
error 1.30453
error 1.28329
error 1.27599
error 1.27275
error 1.27108
error 1.27014
error 1.26957
test it
([0.0, 0.0], '->', [0.019645293674000152])
([0.0, 0.5], '->', [0.5981006916165954])
([0.0, 1.0], '->', [0.5673621981298169])
([0.5, 0.0], '->', [0.5801274708105488])
([0.5, 0.5], '->', [0.5475774428347904])
([0.5, 1.0], '->', [0.5054692523873793])
([1.0, 0.0], '->', [0.5269586801603834])
([1.0, 0.5], '->', [0.48368767897171666])
([1.0, 1.0], '->', [0.43916379836698244])
This is the definition of test run (only part of code I modified):
def demo():
# Teach network checkerboard function
pat = [
[ [0.0,0.0], [0.0] ],
[ [0.0,0.5], [1.0] ],
[ [0.0,1.0], [0.0] ],
[ [0.5,0.0], [1.0] ],
[ [0.5,0.5], [0.0] ],
[ [0.5,1.0], [1.0] ],
[ [1.0,0.0], [0.0] ],
[ [1.0,0.5], [1.0] ],
[ [1.0,1.0], [0.0] ]
]
# create a network with two input, 10 hidden, and one output nodes
n = NN(2, 10, 1)
print " train it with some patterns "
n.train(pat)
print " test it "
n.test(pat)