machine-learning - Encog月球着陆器扩展

Question

这个问题参考了在 Encog 存储库中获得的C#'s Lunar Lander Example 。如示例所示，我正在使用 NeuralSimulatedAnnealing 来训练我的多层前馈网络（50 个 epoch）

BasicNetwork network = CreateNetwork();

IMLTrain train;
train = new NeuralSimulatedAnnealing(network, new PilotScore(), 10, 2, 100);

_

public static BasicNetwork CreateNetwork() {
    var pattern = new FeedForwardPattern {InputNeurons = 3};
    pattern.AddHiddenLayer(50);
    pattern.OutputNeurons = 1;
    pattern.ActivationFunction = new ActivationTANH();
    var network = (BasicNetwork) pattern.Generate();
    network.Reset();
    return network;
}

这个例子效果很好，神经飞行员准确地学习了如何在给定的条件下降落宇宙飞船，但是我想要更多的东西！

为此，我创建了一个如下所示的类全局变量，并在 LanderSimulator 类中修改了一行

namespace Encog.Examples.Lunar
{
    class globals
    {
        public static int fuelConsumption { get; set; }
    }
}

_

 public void Turn(bool thrust){
    Seconds++;
    Velocity -= Gravity;
    Altitude += Velocity;

    if (thrust && Fuel > 0)
    {
        Fuel-= globals.fuelConsumption;    //changed instead of Fuel--;
        Velocity += Thrust;
    }

    Velocity = Math.Max(-TerminalVelocity, Velocity);
    Velocity = Math.Min(TerminalVelocity, Velocity);

    if (Altitude < 0)
        Altitude = 0;
}

因此，现在取决于fuelConsumption变量，每次推力都会消耗燃料。然后我尝试了三个不同的燃料消耗值，以下是各个网络各自的最佳分数：

//NETWORK 1
globals.fuelConsumption = 1;
bestScore: 7986

//NETWORK 2
globals.fuelConsumption = 5;
bestScore: 7422

//NETWORK 3
globals.fuelConsumption = 10;
bestScore: 6921

当我在彼此上测试这些网络时，结果令人失望：

当fuelConsumed 分别为5 和10 时，网络1 的得分为-39591 和-39661。
当fuelConsumed 分别为1 和10 时，网络2 的得分为-8832 和-35671。
当fuelConsumed 分别为1 和5 时，网络3 的得分为-24510 和-19697。

因此，我尝试为所有三种场景训练一个网络，如下所示：

int epoch;

epoch = 1;
globals.fuelConsumption = 1;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    epoch++;
}
Console.WriteLine("--------------------------------------");

epoch = 1;
globals.fuelConsumption = 5;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    epoch++;
}
Console.WriteLine("--------------------------------------");
epoch = 1;
globals.fuelConsumption = 10;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    epoch++;
}

Console.WriteLine(@"The score of experienced pilot is:");
network = (BasicNetwork) train.Method;

var pilot = new NeuralPilot(network, false);
globals.fuelConsumption = 1;
Console.WriteLine("@1: " + pilot.ScorePilot());
globals.fuelConsumption = 5;
Console.WriteLine("@5: " + pilot.ScorePilot());
globals.fuelConsumption = 10;
Console.WriteLine("@10: " + pilot.ScorePilot());

但结果还是一样

The score of experienced pilot is:
@1: -27485
@5: -27565
@10: 7448

如何创建一个神经飞行员，让我在所有三种情况下都能获得最高分？

score 0 · Accepted Answer

为了解决这个难题，我改用NEAT 网络，而不是使用传统的前馈或循环网络。这是代码中的一些有趣的变化..

NEATPopulation network = CreateNetwork();
TrainEA train = default(TrainEA);

_

public static NEATPopulation CreateNetwork(){
    int inputNeurons = 3;
    int outputNeurons = 1;
    NEATPopulation network = new NEATPopulation(inputNeurons, outputNeurons, 100);
    network.Reset();
    return network;
}

然后在调整NeuralPilot 类中的一些参数后，

private readonly NEATNetwork _network;

public NeuralPilot(NEATNetwork network, bool track)

我不得不更改ScorePilot 函数，因为 NEATNetworks 默认使用 SteepenedSigmoidActivation 而不是传统的 ActivationLinear 或 ActivatonTanH 输出

bool thrust;

if (value > 0.5){       //changed from, if (value > 0){
    thrust = true;
    if (_track)
        Console.WriteLine(@"THRUST");
}
else
    thrust = false;

所以现在训练一个单一的网络如下所示：

OriginalNEATSpeciation speciation = default(OriginalNEATSpeciation);
speciation = new OriginalNEATSpeciation();

int epoch;
double best_1, best_5, best_10;
best_1 = best_5 = best_10 = 0;

train = NEATUtil.ConstructNEATTrainer(network, new PilotScore());
train.Speciation = speciation;

epoch = 1;
globals.fuelConsumption = 1;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    best_1 = train.Error;
    epoch++;
}
Console.WriteLine("--------------------------------------");

train = NEATUtil.ConstructNEATTrainer(network, new PilotScore());
train.Speciation = speciation;

epoch = 1;
globals.fuelConsumption = 5;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    best_5 = train.Error;
    epoch++;
}
Console.WriteLine("--------------------------------------");

train = NEATUtil.ConstructNEATTrainer(network, new PilotScore());
train.Speciation = speciation;

epoch = 1;
globals.fuelConsumption = 10;
for (int i = 0; i < 50; i++){
    train.Iteration();
    Console.WriteLine(@"Epoch #" + epoch + @" Score:" + train.Error);
    best_10 = train.Error;
    epoch++;
}

Console.WriteLine(@"The score of experienced pilot is:");

NEATNetwork trainedNetwork = default(NEATNetwork);
trainedNetwork = (NEATNetwork)train.CODEC.Decode(network.BestGenome);

var pilot = new NeuralPilot(trainedNetwork, false);
globals.fuelConsumption = 1;
Console.WriteLine("@bestScore of " + best_1.ToString() +" @1: liveScore is " + pilot.ScorePilot());
globals.fuelConsumption = 5;
Console.WriteLine("@bestScore of " + best_5.ToString() + " @5: liveScore is " + pilot.ScorePilot());
globals.fuelConsumption = 10;
Console.WriteLine("@bestScore of " + best_10.ToString() + " @10: liveScore is " + pilot.ScorePilot());

结果很冒险！以下是随机测试的一些结果：

The score of experienced pilot is:
@bestScore of 5540 @1: liveScore is -4954
@bestScore of 1160 @5: liveScore is 3823
@bestScore of 3196 @10: liveScore is 3196

The score of experienced pilot is:
@bestScore of 7455 @1: liveScore is 8227
@bestScore of 6324 @5: liveScore is 7427
@bestScore of 6427 @10: liveScore is 6427

The score of experienced pilot is:
@bestScore of 5322 @1: liveScore is -4617
@bestScore of 1898 @5: liveScore is 9531
@bestScore of 2086 @10: liveScore is 2086

The score of experienced pilot is:
@bestScore of 7493 @1: liveScore is -3848
@bestScore of 4907 @5: liveScore is -13840
@bestScore of 4954 @10: liveScore is 4954

The score of experienced pilot is:
@bestScore of 6560 @1: liveScore is 4046
@bestScore of 5775 @5: liveScore is 3366
@bestScore of 2516 @10: liveScore is 2516

如您所见，我们确实在第二种情况下一直获得正分，但最终网络性能与初始最佳分值之间似乎没有任何关系。因此，问题可能会得到解决，但不会以令人满意的方式解决。

machine-learning - Encog月球着陆器扩展

1 回答 1

Related

Reference