3

首先,我是一个完全的业余爱好者,所以我可能会混淆一些术语。

我一直在研究神经网络来玩 Connect 4 / Four In A Row。

网络模型的当前设计是 170 个输入值、417 个隐藏神经元和 1 个输出神经元。网络是全连接的,即每个输入都连接到每个隐藏神经元,每个隐藏神经元都连接到输出节点。

每个连接都有一个独立的权重,每个隐藏节点和单个输出节点都有一个带有权重的附加偏置节点。

Connect 4 游戏状态的 170 个值的输入表示为:

  • 42 对值(84 个输入变量),表示空间是否被玩家 1、玩家 2 占用或空置。
    • 0,0意味着它是免费的
    • 1,0表示这是玩家 1 的位置
    • 0,1表示这是玩家 2 的位置
    • 1,1不可能
  • 另外 42 对值(84 个输入变量)表示在此处添加一块是否会给玩家 1 或玩家 2 一个“连接 4”/“连续四人”。值的组合含义同上。
  • 2个最终输入变量来表示轮到谁:
    • 1,0玩家 1 的回合
    • 0,1玩家 2 的回合
    • 1,1并且0,0不可能

我测量了 100 场比赛的平均均方误差,超过 10,000 场不同配置的比赛,得出:

  • 417个隐藏神经元
  • Alpha 和 Beta 学习率在开始时为 0.1,并在 epoch 总数中线性下降至 0.01
  • λ 值为 0.5
  • 100 次移动中的 90 次在开始时是随机的,在前 50% 的时期之后下降到每 100 次中的 10 次。所以在中间点,100 个动作中有 10 个是随机的
  • 前 50% 的 epoch 以随机移动开始
  • 每个节点都使用 Sigmoid 激活函数

此图像显示了以对数刻度绘制的各种配置的结果。这就是我确定要使用的配置的方式。

在此处输入图像描述

-1我通过将获胜状态下的棋盘输出与玩家 2 获胜和玩家 1 获胜的输出进行比较来计算这个均方误差1。我每 100 场比赛将这些值相加,然后将总数除以 100,得到 1000 个值以绘制在上图中。即代码片段是:

if(board.InARowConnected(4) == Board<7,6,4>::Player1)
{
    totalLoss += NN->BackPropagateFinal({1},previousNN,alpha,beta,lambda);
    winState = true;
}
else if(board.InARowConnected(4) == Board<7,6,4>::Player2)
{
    totalLoss += NN->BackPropagateFinal({-1},previousNN,alpha,beta,lambda);
    winState = true;
}
else if(!board.IsThereAvailableMove())
{
    totalLoss += NN->BackPropagateFinal({0},previousNN,alpha,beta,lambda);
    winState = true;
}

...

if(gameNumber % 100 == 0 && gameNumber != 0)
{
    totalLoss = totalLoss / gamesToOutput;
    matchFile << std::fixed << std::setprecision(51) << totalLoss << std::endl;
    totalLoss = 0.0;
}

我训练网络的方式是让它一遍又一遍地与自己对抗。这是一个前馈网络,我正在使用 TD-Lambda 来训练它的每一步(每一步都不是随机选择的)。

给予神经网络的董事会状态是通过以下方式完成的:

template<std::size_t BoardWidth, std::size_t BoardHeight, std::size_t InARow>
void create_board_state(std::array<double,BoardWidth*BoardHeight*4+2>& gameState, const Board<BoardWidth,BoardHeight,InARow>& board,
                        const typename Board<BoardWidth,BoardHeight,InARow>::Player player)
{
    using BoardType = Board<BoardWidth,BoardHeight,InARow>;
    auto bb = board.GetBoard();
    std::size_t stateIndex = 0;
    for(std::size_t boardIndex = 0; boardIndex < BoardWidth*BoardHeight; ++boardIndex, stateIndex += 2)
    {
        if(bb[boardIndex] == BoardType::Free)
        {
            gameState[stateIndex] = 0;
            gameState[stateIndex+1] = 0;
        }
        else if(bb[boardIndex] == BoardType::Player1)
        {
            gameState[stateIndex] = 1;
            gameState[stateIndex+1] = 0;
        }
        else
        {
            gameState[stateIndex] = 0;
            gameState[stateIndex+1] = 1;
        }
    }

    for(std::size_t x = 0; x < BoardWidth; ++x)
    {
        for(std::size_t y = 0; y < BoardHeight; ++y)
        {
            auto testBoard1 = board;
            auto testBoard2 = board;
            testBoard1.SetBoardChecker(x,y,Board<BoardWidth,BoardHeight,InARow>::Player1);
            testBoard2.SetBoardChecker(x,y,Board<BoardWidth,BoardHeight,InARow>::Player2);
            // player 1's set
            if(testBoard1.InARowConnected(4) == Board<7,6,4>::Player1)
                gameState[stateIndex] = 1;
            else
                gameState[stateIndex] = 0;
            // player 2's set
            if(testBoard2.InARowConnected(4) == Board<7,6,4>::Player2)
                gameState[stateIndex+1] = 1;
            else
                gameState[stateIndex+1] = 0;

            stateIndex += 2;
        }
    }

    if(player == Board<BoardWidth,BoardHeight,InARow>::Player1)
    {
        gameState[stateIndex] = 1;
        gameState[stateIndex+1] = 0;
    }
    else
    {
        gameState[stateIndex] = 0;
        gameState[stateIndex+1] = 1;
    }
}

它是模板化的,以便以后更容易更改。我不相信上面有什么问题。

我的 Sigmoid 激活函数:

inline double sigmoid(const double x)
{
    //  return 1.0 / (1.0 + std::exp(-x));
    return x / (1.0 + std::abs(x));
}

我的神经元课

template<std::size_t NumInputs>
class Neuron
{
public:
    Neuron()
    {
        for(auto& i : m_inputValues)
            i = 9;
        for(auto& e : m_eligibilityTraces)
            e = 9;
        for(auto& w : m_weights)
            w = 9;
        m_biasWeight = 9;
        m_biasEligibilityTrace = 9;
        m_outputValue = 9;
    }

    void SetInputValue(const std::size_t index, const double value)
    {
        m_inputValues[index] = value;
    }

    void SetWeight(const std::size_t index, const double weight)
    {
        if(std::isnan(weight))
            throw std::runtime_error("Shit! this is a nan bread");
        m_weights[index] = weight;
    }

    void SetBiasWeight(const double weight)
    {
        m_biasWeight = weight;
    }

    double GetInputValue(const std::size_t index) const
    {
        return m_inputValues[index];
    }

    double GetWeight(const std::size_t index) const
    {
        return m_weights[index];
    }

    double GetBiasWeight() const
    {
        return m_biasWeight;
    }

    double CalculateOutput()
    {
        m_outputValue = 0;
        for(std::size_t i = 0; i < NumInputs; ++i)
        {
            m_outputValue += m_inputValues[i] * m_weights[i];
        }
        m_outputValue += 1.0 * m_biasWeight;
        m_outputValue = sigmoid(m_outputValue);
        return m_outputValue;
    }

    double GetOutput() const
    {
        return m_outputValue;
    }

    double GetEligibilityTrace(const std::size_t index) const
    {
        return m_eligibilityTraces[index];
    }

    void SetEligibilityTrace(const std::size_t index, const double eligibility)
    {
        m_eligibilityTraces[index] = eligibility;
    }

    void SetBiasEligibility(const double eligibility)
    {
        m_biasEligibilityTrace = eligibility;
    }

    double GetBiasEligibility() const
    {
        return m_biasEligibilityTrace;
    }

    void ResetEligibilityTraces()
    {
        for(auto& e : m_eligibilityTraces)
            e = 0;
        m_biasEligibilityTrace = 0;
    }

private:
    std::array<double,NumInputs> m_inputValues;
    std::array<double,NumInputs> m_weights;
    std::array<double,NumInputs> m_eligibilityTraces;
    double m_biasWeight;
    double m_biasEligibilityTrace;
    double m_outputValue;
};

我的神经网络课

模板类神经网络 { 公共:

void RandomiseWeights()
{
    double inputToHiddenRange = 4.0 * std::sqrt(6.0 / (NumInputs+1+NumOutputs));
    RandomGenerator inputToHidden(-inputToHiddenRange,inputToHiddenRange);

    double hiddenToOutputRange = 4.0 * std::sqrt(6.0 / (NumHidden+1+1));
    RandomGenerator hiddenToOutput(-hiddenToOutputRange,hiddenToOutputRange);

    for(auto& hiddenNeuron : m_hiddenNeurons)
    {
        for(std::size_t i = 0; i < NumInputs; ++i)
            hiddenNeuron.SetWeight(i, inputToHidden());
        hiddenNeuron.SetBiasWeight(inputToHidden());
    }

    for(auto& outputNeuron : m_outputNeurons)
    {
        for(std::size_t h = 0; h < NumHidden; ++h)
            outputNeuron.SetWeight(h, hiddenToOutput());
        outputNeuron.SetBiasWeight(hiddenToOutput());
    }
}

double GetOutput(const std::size_t index) const
{
    return m_outputNeurons[index].GetOutput();
}

std::array<double,NumOutputs> GetOutputs()
{
    std::array<double, NumOutputs> returnValue;
    for(std::size_t o = 0; o < NumOutputs; ++o)
        returnValue[o] = m_outputNeurons[o].GetOutput();
    return returnValue;
}

void SetInputValue(const std::size_t index, const double value)
{
    for(auto& hiddenNeuron : m_hiddenNeurons)
        hiddenNeuron.SetInputValue(index, value);
}

std::array<double,NumOutputs> Calculate()
{
    for(auto& h : m_hiddenNeurons)
        h.CalculateOutput();
    for(auto& o : m_outputNeurons)
        o.CalculateOutput();

    return GetOutputs();
}

std::array<double,NumOutputs> FeedForward(const std::array<double,NumInputs>& inputValues)
{
    for(std::size_t h = 0; h < NumHidden; ++h)//auto& hiddenNeuron : m_hiddenNeurons)
    {
        for(std::size_t i = 0; i < NumInputs; ++i)
            m_hiddenNeurons[h].SetInputValue(i,inputValues[i]);

        m_hiddenNeurons[h].CalculateOutput();
    }

    std::array<double, NumOutputs> returnValue;

    for(std::size_t h = 0; h < NumHidden; ++h)
    {
        auto hiddenOutput = m_hiddenNeurons[h].GetOutput();
        for(std::size_t o = 0; o < NumOutputs; ++o)
            m_outputNeurons[o].SetInputValue(h, hiddenOutput);
    }

    for(std::size_t o = 0; o < NumOutputs; ++o)
    {
        returnValue[o] = m_outputNeurons[o].CalculateOutput();
    }

    return returnValue;
}

double BackPropagateFinal(const std::array<double,NumOutputs>& actualValues, const NeuralNetwork<NumInputs,NumHidden,NumOutputs>* NN, const double alpha, const double beta, const double lambda)
{
    for(std::size_t iO = 0; iO < NumOutputs; ++iO)
    {
        auto y = NN->m_outputNeurons[iO].GetOutput();
        auto y1 = actualValues[iO];

        for(std::size_t iH = 0; iH < NumHidden; ++iH)
        {
            auto e = NN->m_outputNeurons[iO].GetEligibilityTrace(iH);
            auto h = NN->m_hiddenNeurons[iH].GetOutput();
            auto w = NN->m_outputNeurons[iO].GetWeight(iH);

            double e1 = lambda * e + (y * (1.0 - y) * h);

            double w1 = w + beta * (y1 - y) * e1;

            m_outputNeurons[iO].SetEligibilityTrace(iH,e1);
            m_outputNeurons[iO].SetWeight(iH,w1);
        }

        auto e = NN->m_outputNeurons[iO].GetBiasEligibility();
        auto h = 1.0;
        auto w = NN->m_outputNeurons[iO].GetBiasWeight();

        double e1 = lambda * e + (y * (1.0 - y) * h);

        double w1 = w + beta * (y1 - y) * e1;

        m_outputNeurons[iO].SetBiasEligibility(e1);
        m_outputNeurons[iO].SetBiasWeight(w1);
    }

    for(std::size_t iH = 0; iH < NumHidden; ++iH)
    {
        auto h = NN->m_hiddenNeurons[iH].GetOutput();

        for(std::size_t iI = 0; iI < NumInputs; ++iI)
        {
            auto e = NN->m_hiddenNeurons[iH].GetEligibilityTrace(iI);
            auto x = NN->m_hiddenNeurons[iH].GetInputValue(iI);
            auto u = NN->m_hiddenNeurons[iH].GetWeight(iI);

            double sumError = 0;

            for(std::size_t iO = 0; iO < NumOutputs; ++iO)
            {
                auto w = NN->m_outputNeurons[iO].GetWeight(iH);
                auto y = NN->m_outputNeurons[iO].GetOutput();
                auto y1 = actualValues[iO];

                auto grad = y1 - y;

                double e1 = lambda * e + (y * (1.0 - y) * w * h * (1.0 - h) * x);

                sumError += grad * e1;
            }

            double u1 = u + alpha * sumError;

            m_hiddenNeurons[iH].SetEligibilityTrace(iI,sumError);
            m_hiddenNeurons[iH].SetWeight(iI,u1);
        }

        auto e = NN->m_hiddenNeurons[iH].GetBiasEligibility();
        auto x = 1.0;
        auto u = NN->m_hiddenNeurons[iH].GetBiasWeight();

        double sumError = 0;

        for(std::size_t iO = 0; iO < NumOutputs; ++iO)
        {
            auto w = NN->m_outputNeurons[iO].GetWeight(iH);
            auto y = NN->m_outputNeurons[iO].GetOutput();
            auto y1 = actualValues[iO];

            auto grad = y1 - y;

            double e1 = lambda * e + (y * (1.0 - y) * w * h * (1.0 - h) * x);

            sumError += grad * e1;
        }

        double u1 = u + alpha * sumError;

        m_hiddenNeurons[iH].SetBiasEligibility(sumError);
        m_hiddenNeurons[iH].SetBiasWeight(u1);
    }

    double retVal = 0;
    for(std::size_t o = 0; o < NumOutputs; ++o)
    {
        retVal += 0.5 * alpha * std::pow((NN->GetOutput(o) - GetOutput(0)),2);
    }
    return retVal / NumOutputs;
}

double BackPropagate(const NeuralNetwork<NumInputs,NumHidden,NumOutputs>* NN, const double alpha, const double beta, const double lambda)
{
    for(std::size_t iO = 0; iO < NumOutputs; ++iO)
    {
        auto y = NN->m_outputNeurons[iO].GetOutput();
        auto y1 = m_outputNeurons[iO].GetOutput();

        for(std::size_t iH = 0; iH < NumHidden; ++iH)
        {
            auto e = NN->m_outputNeurons[iO].GetEligibilityTrace(iH);
            auto h = NN->m_hiddenNeurons[iH].GetOutput();
            auto w = NN->m_outputNeurons[iO].GetWeight(iH);

            double e1 = lambda * e + (y * (1.0 - y) * h);

            double w1 = w + beta * (y1 - y) * e1;

            m_outputNeurons[iO].SetEligibilityTrace(iH,e1);

            m_outputNeurons[iO].SetWeight(iH,w1);
        }

        auto e = NN->m_outputNeurons[iO].GetBiasEligibility();
        auto h = 1.0;
        auto w = NN->m_outputNeurons[iO].GetBiasWeight();

        double e1 = lambda * e + (y * (1.0 - y) * h);

        double w1 = w + beta * (y1 - y) * e1;

        m_outputNeurons[iO].SetBiasEligibility(e1);
        m_outputNeurons[iO].SetBiasWeight(w1);
    }

    for(std::size_t iH = 0; iH < NumHidden; ++iH)
    {
        auto h = NN->m_hiddenNeurons[iH].GetOutput();

        for(std::size_t iI = 0; iI < NumInputs; ++iI)
        {
            auto e = NN->m_hiddenNeurons[iH].GetEligibilityTrace(iI);
            auto x = NN->m_hiddenNeurons[iH].GetInputValue(iI);
            auto u = NN->m_hiddenNeurons[iH].GetWeight(iI);

            double sumError = 0;

            for(std::size_t iO = 0; iO < NumOutputs; ++iO)
            {
                auto w = NN->m_outputNeurons[iO].GetWeight(iH);
                auto y = NN->m_outputNeurons[iO].GetOutput();
                auto y1 = m_outputNeurons[iO].GetOutput();

                auto grad = y1 - y;

                double e1 = lambda * e + (y * (1.0 - y) * w * h * (1.0 - h) * x);

                sumError += grad * e1;
            }

            double u1 = u + alpha * sumError;

            m_hiddenNeurons[iH].SetEligibilityTrace(iI,sumError);

            m_hiddenNeurons[iH].SetWeight(iI,u1);
        }

        auto e = NN->m_hiddenNeurons[iH].GetBiasEligibility();
        auto x = 1.0;
        auto u = NN->m_hiddenNeurons[iH].GetBiasWeight();

        double sumError = 0;

        for(std::size_t iO = 0; iO < NumOutputs; ++iO)
        {
            auto w = NN->m_outputNeurons[iO].GetWeight(iH);
            auto y = NN->m_outputNeurons[iO].GetOutput();
            auto y1 = m_outputNeurons[iO].GetOutput();

            auto grad = y1 - y;

            double e1 = lambda * e + (y * (1.0 - y) * w * h * (1.0 - h) * x);

            sumError += grad * e1;
        }

        double u1 = u + alpha * sumError;

        m_hiddenNeurons[iH].SetBiasEligibility(sumError);
        m_hiddenNeurons[iH].SetBiasWeight(u1);
    }

    double retVal = 0;
    for(std::size_t o = 0; o < NumOutputs; ++o)
    {
        retVal += 0.5 * alpha * std::pow((NN->GetOutput(o) - GetOutput(0)),2);
    }
    return retVal / NumOutputs;
}

std::array<double,NumInputs*NumHidden+NumHidden+NumHidden*NumOutputs+NumOutputs> GetNetworkWeights() const
{
    std::array<double,NumInputs*NumHidden+NumHidden+NumHidden*NumOutputs+NumOutputs> returnVal;

    std::size_t weightPos = 0;

    for(std::size_t h = 0; h < NumHidden; ++h)
    {
        for(std::size_t i = 0; i < NumInputs; ++i)
            returnVal[weightPos++] = m_hiddenNeurons[h].GetWeight(i);
        returnVal[weightPos++] = m_hiddenNeurons[h].GetBiasWeight();
    }
    for(std::size_t o = 0; o < NumOutputs; ++o)
    {
        for(std::size_t h = 0; h < NumHidden; ++h)
            returnVal[weightPos++] = m_outputNeurons[o].GetWeight(h);
        returnVal[weightPos++] = m_outputNeurons[o].GetBiasWeight();
    }

    return returnVal;
}

static constexpr std::size_t NumWeights = NumInputs*NumHidden+NumHidden+NumHidden*NumOutputs+NumOutputs;


void SetNetworkWeights(const std::array<double,NumInputs*NumHidden+NumHidden+NumHidden*NumOutputs+NumOutputs>& weights)
{
    std::size_t weightPos = 0;
    for(std::size_t h = 0; h < NumHidden; ++h)
    {
        for(std::size_t i = 0; i < NumInputs; ++i)
            m_hiddenNeurons[h].SetWeight(i, weights[weightPos++]);
        m_hiddenNeurons[h].SetBiasWeight(weights[weightPos++]);
    }
    for(std::size_t o = 0; o < NumOutputs; ++o)
    {
        for(std::size_t h = 0; h < NumHidden; ++h)
            m_outputNeurons[o].SetWeight(h, weights[weightPos++]);
        m_outputNeurons[o].SetBiasWeight(weights[weightPos++]);
    }
}

void ResetEligibilityTraces()
{
    for(auto& h : m_hiddenNeurons)
        h.ResetEligibilityTraces();
    for(auto& o : m_outputNeurons)
        o.ResetEligibilityTraces();
}

private:

std::array<Neuron<NumInputs>,NumHidden> m_hiddenNeurons;
std::array<Neuron<NumHidden>,NumOutputs> m_outputNeurons;
};

我相信我可能遇到问题的地方之一是神经网络类中的BackPropagateBackPropagateFinal方法。

这是我main正在训练网络的功能:

int main()
{
    std::ofstream matchFile("match.txt");

    RandomGenerator randomPlayerStart(0,1);
    RandomGenerator randomMove(0,100);

    Board<7,6,4> board;

    auto NN = new NeuralNetwork<7*6*4+2,417,1>();
    auto previousNN = new NeuralNetwork<7*6*4+2,417,1>();
    NN->RandomiseWeights();

    const int numGames = 3000000;
    double alpha = 0.1;
    double beta = 0.1;
    double lambda = 0.5;
    double learningRateFloor = 0.01;
    double decayRateAlpha = (alpha - learningRateFloor) / numGames;
    double decayRateBeta = (beta - learningRateFloor) / numGames;
    double randomChance = 90; // out of 100
    double randomChangeFloor = 10;
    double percentToReduceRandomOver = 0.5;
    double randomChangeDecay = (randomChance-randomChangeFloor) / (numGames*percentToReduceRandomOver);
    double percentOfGamesToRandomiseStart = 0.5;

    int numGamesWonP1 = 0;
    int numGamesWonP2 = 0;

    int gamesToOutput = 100;

    matchFile << "Num Games: " << numGames << "\t\ta,b,l: " << alpha << ", " << beta << ", " << lambda << std::endl;

    Board<7,6,4>::Player playerStart = randomPlayerStart() > 0.5 ? Board<7,6,4>::Player1 : Board<7,6,4>::Player2;

    double totalLoss = 0.0;

    for(int gameNumber = 0; gameNumber < numGames; ++gameNumber)
    {
        bool winState = false;
        Board<7,6,4>::Player playerWhoTurnItIs = playerStart;
        playerStart = playerStart == Board<7,6,4>::Player1 ? Board<7,6,4>::Player2 : Board<7,6,4>::Player1;
        board.ClearBoard();

        int turnNumber = 0;

        while(!winState)
        {
            Board<7,6,4>::Player playerWhoTurnItIsNot = playerWhoTurnItIs == Board<7,6,4>::Player1 ? Board<7,6,4>::Player2 : Board<7,6,4>::Player1;

            bool wasRandomMove = false;

            std::size_t selectedMove;
            bool moveFound = false;

            if(board.IsThereAvailableMove())
            {
                std::vector<std::size_t> availableMoves;
                if((gameNumber <= numGames * percentOfGamesToRandomiseStart && turnNumber == 0) || randomMove() > 100.0-randomChance)
                    wasRandomMove = true;

                std::size_t bestMove = 8;
                double bestWorstResponse = playerWhoTurnItIs == Board<7,6,4>::Player1 ? std::numeric_limits<double>::min() : std::numeric_limits<double>::max();

                for(std::size_t m = 0; m < 7; ++m)
                {
                    Board<7,6,4> testBoard = board;    // make a copy of the current board to run our tests
                    if(testBoard.AvailableMoveInColumn(m))
                    {
                        if(wasRandomMove)
                        {
                            availableMoves.push_back(m);
                        }
                        testBoard.AddChecker(m, playerWhoTurnItIs);

                        double worstResponse = playerWhoTurnItIs == Board<7,6,4>::Player1 ? std::numeric_limits<double>::max() : std::numeric_limits<double>::min();
                        std::size_t worstMove = 8;

                        for(std::size_t m2 = 0; m2 < 7; ++m2)
                        {
                            Board<7,6,4> testBoard2 = testBoard;
                            if(testBoard2.AvailableMoveInColumn(m2))
                            {
                                testBoard2.AddChecker(m,playerWhoTurnItIsNot);

                                StateType state;
                                create_board_state(state, testBoard2, playerWhoTurnItIs);
                                auto outputs = NN->FeedForward(state);

                                if(playerWhoTurnItIs == Board<7,6,4>::Player1 && (outputs[0] < worstResponse || worstMove == 8))
                                {
                                    worstResponse = outputs[0];
                                    worstMove = m2;
                                }
                                else if(playerWhoTurnItIs == Board<7,6,4>::Player2 && (outputs[0] > worstResponse || worstMove == 8))
                                {
                                    worstResponse = outputs[0];
                                    worstMove = m2;
                                }
                            }
                        }

                        if(playerWhoTurnItIs == Board<7,6,4>::Player1 && (worstResponse > bestWorstResponse || bestMove == 8))
                        {
                            bestWorstResponse = worstResponse;
                            bestMove = m;
                        }
                        else if(playerWhoTurnItIs == Board<7,6,4>::Player2 && (worstResponse < bestWorstResponse || bestMove == 8))
                        {
                            bestWorstResponse = worstResponse;
                            bestMove = m;
                        }
                    }
                }
                if(bestMove == 8)
                {
                    std::cerr << "wasn't able to determine the best move to make" << std::endl;
                    return 0;
                }
                if(gameNumber <= numGames * percentOfGamesToRandomiseStart && turnNumber == 0)
                {
                    std::size_t rSelection = int(randomMove()) % (availableMoves.size());

                    selectedMove = availableMoves[rSelection];
                    moveFound = true;
                }
                else if(wasRandomMove)
                {
                    std::remove(availableMoves.begin(),availableMoves.end(),bestMove);
                    std::size_t rSelection = int(randomMove()) % (availableMoves.size());

                    selectedMove = availableMoves[rSelection];
                    moveFound = true;
                }
                else
                {
                    selectedMove = bestMove;
                    moveFound = true;
                }
            }

            StateType prevState;
            create_board_state(prevState,board,playerWhoTurnItIs);
            NN->FeedForward(prevState);
            *previousNN = *NN;

            // now that we have the move, add it to the board
            StateType state;
            board.AddChecker(selectedMove,playerWhoTurnItIs);
            create_board_state(state,board,playerWhoTurnItIsNot);

            auto outputs = NN->FeedForward(state);

            if(board.InARowConnected(4) == Board<7,6,4>::Player1)
            {
                totalLoss += NN->BackPropagateFinal({1},previousNN,alpha,beta,lambda);
                winState = true;
                ++numGamesWonP1;
            }
            else if(board.InARowConnected(4) == Board<7,6,4>::Player2)
            {
                totalLoss += NN->BackPropagateFinal({-1},previousNN,alpha,beta,lambda);
                winState = true;
                ++numGamesWonP2;
            }
            else if(!board.IsThereAvailableMove())
            {
                totalLoss += NN->BackPropagateFinal({0},previousNN,alpha,beta,lambda);
                winState = true;
            }
            else if(turnNumber > 0 && !wasRandomMove)
            {
                NN->BackPropagate(previousNN,alpha,beta,lambda);
            }

            if(!wasRandomMove)
            {
                outputs = NN->FeedForward(state);
            }

            ++turnNumber;
            playerWhoTurnItIs = playerWhoTurnItIsNot;
        }

        alpha -= decayRateAlpha;
        beta -= decayRateBeta;

        NN->ResetEligibilityTraces();

        if(gameNumber > 0 && randomChance > randomChangeFloor && gameNumber <= numGames * percentToReduceRandomOver)
        {
            randomChance -= randomChangeDecay;
            if(randomChance < randomChangeFloor)
                randomChance = randomChangeFloor;
        }

        if(gameNumber % gamesToOutput == 0 && gameNumber != 0)
        {
            totalLoss = totalLoss / gamesToOutput;
            matchFile << std::fixed << std::setprecision(51) << totalLoss << std::endl;
            totalLoss = 0.0;
        }
    }

    matchFile << std::endl << "Games won: " << numGamesWonP1 << " . " << numGamesWonP2 << std::endl;

    auto weights = NN->GetNetworkWeights();
    matchFile << std::endl;
    matchFile << std::endl;
    for(const auto& w : weights)
        matchFile << std::fixed << std::setprecision(51) << w << ", \n";
    matchFile << std::endl;

    return 0;
}

我认为我可能遇到问题的一个地方是选择最佳移动的极小极大。

还有一些我认为与我遇到的问题不太相关的部分。

问题

  1. 我训练 1000 场比赛还是 3000000 场比赛似乎并不重要,玩家 1 或玩家 2 将赢得绝大多数比赛。100 场比赛中有 90 场由一名球员赢得。如果我输出实际的单个游戏移动和输出,我可以看到其他玩家赢得的游戏几乎总是幸运随机移动的结果。

    同时,我注意到预测输出某种“支持”玩家。即输出似乎在 的负数一侧0,因此玩家 1 总是做出最好的动作,例如,但它们似乎都被预测为玩家 2 获胜。

    有时是玩家 1 赢得多数,其他时候是玩家 2。我假设这是由于随机权重对一名玩家初始化轻微。

    大约第一场比赛不喜欢一个球员而不是另一个球员,但它很快就开始以一种方式“倾斜”。

  2. 我现在已经尝试训练超过 3000000 场比赛,这需要 3 天时间,但网络似乎仍然无法做出正确的决定。我已经通过在 riddles.io Connect 4 comp 上播放其他“机器人”来测试网络。

    • 它没有意识到它需要连续阻挡对手4个
    • 即使在 3000000 场比赛之后,它也不会将中柱作为第一步,我们知道这是你唯一可以保证获胜的开始动作。

任何帮助和指导将不胜感激。具体来说,我对 TD-Lambda 反向传播的实现是否正确?

4

0 回答 0