1

I'm just starting to learn neural networks, to see if they could be useful for me.

I downloaded this simple python code of 3 layer feed forward neural network

and I just modified the learning pattern to checkerboard instead of XOR and number of nodes in hidden layer to 10. If I understand universal approximation theorem this 3 layer network (one hidden layer) should be able to learn any function from R2 to R including my checkerboard function. ... But it does not.

What is wrong?

  • I understand universal approximation theorem wrong - maybe the function should be monotoneous or convex? (The area should be linearly separable? ).
  • I need an other layer (2 hidden layers) to approximate such non-convec, non linearly separable function?
  • The network is just traped in some local minimum? (but I don't think so? I tried several runs, initial weight are random, but the result is the same)
  • 10 nodes in hiden layer is not enought? I tried different number - with 5 it is almost the same. With 30 it does not

Is there any general approach how to modify the optimization (learning) scheme to be sure to converge to any function which can be theoretically described by the given network according to universal approximation theorem?

Is there any general test, which will tell me, If my network (with given topology, number of layers and nodes) is able to describe given function, if it is just trapped in some local minimum?

This are the results with 10 neurons in hidden layer:

 train it with some patterns 
error 3.14902
error 1.37104
error 1.35305
error 1.30453
error 1.28329
error 1.27599
error 1.27275
error 1.27108
error 1.27014
error 1.26957
 test it 
([0.0, 0.0], '->', [0.019645293674000152])
([0.0, 0.5], '->', [0.5981006916165954])
([0.0, 1.0], '->', [0.5673621981298169])
([0.5, 0.0], '->', [0.5801274708105488])
([0.5, 0.5], '->', [0.5475774428347904])
([0.5, 1.0], '->', [0.5054692523873793])
([1.0, 0.0], '->', [0.5269586801603834])
([1.0, 0.5], '->', [0.48368767897171666])
([1.0, 1.0], '->', [0.43916379836698244])

This is the definition of test run (only part of code I modified):

def demo():
    # Teach network checkerboard function
    pat = [
        [ [0.0,0.0], [0.0] ],
        [ [0.0,0.5], [1.0] ],
        [ [0.0,1.0], [0.0] ],

        [ [0.5,0.0], [1.0] ],
        [ [0.5,0.5], [0.0] ],
        [ [0.5,1.0], [1.0] ],

        [ [1.0,0.0], [0.0] ],
        [ [1.0,0.5], [1.0] ],
        [ [1.0,1.0], [0.0] ]
        ]

    # create a network with two input, 10 hidden, and one output nodes
    n = NN(2, 10, 1)
    print " train it with some patterns "
    n.train(pat)
    print " test it "
    n.test(pat)
4

1 回答 1

0

通用逼近定理表明,任何连续函数都可以用一个隐藏层任意逼近。它不需要任何类型的数据可分离性,我们谈论的是任意函数。

特别是,如果您有 N 个隐藏节点,其中 N 是训练样本的数量,那么始终可以完美地学习您的训练集(它可以简单地记住所有输入-输入对)。

然而,没有关于这些对象的泛化的保证,也没有关于较小网络的学习保证。神经网络不是“通用答案”,它们很难正确处理。

回到你的问题,你的功能很简单,上面的问题在这里都没有应用,这样的功能可以很容易地通过非常基本的网络来学习。看起来问题在于以下两个方面之一:

  • 执行错误
  • 神经元中缺乏正确的激活函数(和/或偏置项)
于 2014-04-07T08:17:47.237 回答