c++ - 具有 alpha-beta 修剪问题的 Minimax

Question

我正在为游戏筷子制作一个 C++ 程序。

这是一个非常简单的游戏，总共只有 625 个游戏状态（如果考虑到对称性和无法到达的状态，它会更低）。我已经阅读了 minimax 和 alpha-beta 算法，主要用于井字游戏，但我遇到的问题是，在井字游戏中，不可能循环回到以前的状态，而这很容易在筷子中发生。因此，在运行代码时，最终会出现堆栈溢出。

我通过为以前访问过的状态添加标志来解决此问题（我不知道这是否是正确的方法。）以便可以避免它们，但现在我遇到的问题是输出不像预期的那样对称。

例如，在游戏的开始状态下，每个玩家都有一根手指，所以它们都是对称的。程序告诉我，最好的动作是用左手击打我的右手，而不是相反。

我的源代码是 -

#include <iostream>
#include <array>
#include <vector>
#include <limits>
std::array<int, 625> t; //Flags for visited states.
std::array<int, 625> f; //Flags for visited states.
int no = 0; //Unused. For debugging.
class gamestate
{
public:
    gamestate(int x, bool t) : turn(t) //Constructor.
    {
        for (int i = 0; i < 2; i++)
            for (int j = 0; j < 2; j++) {
                val[i][j] = x % 5;
                x /= 5;
            }
        init();
    }
    void print() //Unused. For debugging.
    {
        for (int i = 0; i < 2; i++) {
            for (int j = 0; j < 2; j++)
                std::cout << val[i][j] << "\t";
            std::cout << "\n";
        }
        std::cout << "\n";
    }
    std::array<int, 6> canmove = {{ 1, 1, 1, 1, 1, 1 }}; //List of available moves.
    bool isover() //Is the game over.
    {
        return ended;
    }
    bool won() //Who won the game.
    {
        return winner;
    }
    bool isturn() //Whose turn it is.
    {
        return turn;
    }
    std::vector<int> choosemoves() //Choose the best possible moves in the current state.
    {
        std::vector<int> bestmoves;
        if(ended)
            return bestmoves;
        std::array<int, 6> scores;
        int bestscore;
        if(turn)
            bestscore = std::numeric_limits<int>::min();
        else
            bestscore = std::numeric_limits<int>::max();
        scores.fill(bestscore);
        for (int i = 0; i < 6; i++)
            if (canmove[i]) {
                t.fill(0);
                f.fill(0);
                gamestate *play = new gamestate(this->playmove(i),!turn);
                scores[i] = minimax(play, 0, std::numeric_limits<int>::min(), std::numeric_limits<int>::max());
                std::cout<<i<<": "<<scores[i]<<std::endl;
                delete play;
                if (turn) if (scores[i] > bestscore) bestscore = scores[i];
                if (!turn) if (scores[i] < bestscore) bestscore = scores[i];
            }
        for (int i = 0; i < 6; i++)
            if (scores[i] == bestscore)
                bestmoves.push_back(i);
        return bestmoves;
    }
private:
    std::array<std::array<int, 2>, 2 > val; //The values of the fingers.
    bool turn; //Whose turn it is.
    bool ended = false; //Has the game ended.
    bool winner; //Who won the game.
    void init() //Check if the game has ended and find the available moves.
    {
        if (!(val[turn][0]) && !(val[turn][1])) {
            ended = true;
            winner = !turn;
            canmove.fill(0);
            return;
        }
        if (!(val[!turn][0]) && !(val[!turn][1])) {
            ended = true;
            winner = turn;
            canmove.fill(0);
            return;
        }
        if (!val[turn][0]) {
            canmove[0] = 0;
            canmove[1] = 0;
            canmove[2] = 0;
            if (val[turn][1] % 2)
                canmove[5] = 0;
        }
        if (!val[turn][1]) {
            if (val[turn][0] % 2)
                canmove[2] = 0;
            canmove[3] = 0;
            canmove[4] = 0;
            canmove[5] = 0;
        }
        if (!val[!turn][0]) {
            canmove[0] = 0;
            canmove[3] = 0;
        }
        if (!val[!turn][1]) {
            canmove[1] = 0;
            canmove[4] = 0;
        }
    }
    int playmove(int mov) //Play a move to get the next game state.
    {
        auto newval = val;
        switch (mov) {
        case 0:
            newval[!turn][0] = (newval[turn][0] + newval[!turn][0]);
            newval[!turn][0] = (5 > newval[!turn][0]) ? newval[!turn][0] : 0;
            break;
        case 1:
            newval[!turn][1] = (newval[turn][0] + newval[!turn][1]);
            newval[!turn][1] = (5 > newval[!turn][1]) ? newval[!turn][1] : 0;
            break;
        case 2:
            if (newval[turn][1]) {
                newval[turn][1] = (newval[turn][0] + newval[turn][1]);
                newval[turn][1] = (5 > newval[turn][1]) ? newval[turn][1] : 0;
            } else {
                newval[turn][0] /= 2;
                newval[turn][1] = newval[turn][0];
            }
            break;
        case 3:
            newval[!turn][0] = (newval[turn][1] + newval[!turn][0]);
            newval[!turn][0] = (5 > newval[!turn][0]) ? newval[!turn][0] : 0;
            break;
        case 4:
            newval[!turn][1] = (newval[turn][1] + newval[!turn][1]);
            newval[!turn][1] = (5 > newval[!turn][1]) ? newval[!turn][1] : 0;
            break;
        case 5:
            if (newval[turn][0]) {
                newval[turn][0] = (newval[turn][1] + newval[turn][0]);
                newval[turn][0] = (5 > newval[turn][0]) ? newval[turn][0] : 0;
            } else {
                newval[turn][1] /= 2;
                newval[turn][0] = newval[turn][1];
            }
            break;
        default:
            std::cout << "\nInvalid move!\n";
        }
        int ret = 0;
        for (int i = 1; i > -1; i--)
            for (int j = 1; j > -1; j--) {
                ret+=newval[i][j];
                ret*=5;
            }
        ret/=5;
        return ret;
    }
    static int minimax(gamestate *game, int depth, int alpha, int beta) //Minimax searching function with alpha beta pruning.
    {
        if (game->isover()) {
            if (game->won())
                return 1000 - depth;
            else
                return depth - 1000;
        }
        if (game->isturn()) {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]&&t[game->playmove(i)]!=-1) {
                    int score;
                    if(!t[game->playmove(i)]){
                        t[game->playmove(i)] = -1;
                        gamestate *play = new gamestate(game->playmove(i),!game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        t[game->playmove(i)] = score;
                    }
                    else
                        score = t[game->playmove(i)];
                    if (score > alpha) alpha = score;
                    if (alpha >= beta) break;
                }
            return alpha;
        } else {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]&&f[game->playmove(i)]!=-1) {
                    int score;
                    if(!f[game->playmove(i)]){
                        f[game->playmove(i)] = -1;
                        gamestate *play = new gamestate(game->playmove(i),!game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        f[game->playmove(i)] = score;
                    }
                    else
                        score = f[game->playmove(i)];
                    if (score < beta) beta = score;
                    if (alpha >= beta) break;
                }
            return beta;
        }
    }
};
int main(void)
{
    gamestate test(243, true);
    auto movelist = test.choosemoves();
    for(auto i: movelist)
        std::cout<<i<<std::endl;
    return 0;
}

我将移动以一种以 5 为底的十进制系统传递，因为每只手的值都可以从 0 到 4。

在我输入状态的代码中 -

3    3

4    1

输出说我应该将右手 (1) 击到对手的右侧 (3)，但没有说我应该将其击到对手的左侧 (也是 3)

我认为问题在于我处理无限循环的方式。

什么是正确的方法？或者，如果这是正确的方法，那么我该如何解决这个问题？

另外请让我知道如何改进我的代码。

非常感谢。

编辑：

我已将我的 minimax 函数更改如下，以确保无限循环的得分高于失败，但我仍然没有得到对称性。我还做了一个函数来增加乐谱的深度

static float minimax(gamestate *game, int depth, float alpha, float beta) //Minimax searching function with alpha beta pruning.
    {
        if (game->isover()) {
            if (game->won())
                return 1000 - std::atan(depth) * 2000 / std::acos(-1);
            else
                return std::atan(depth) * 2000 / std::acos(-1) - 1000;
        }
        if (game->isturn()) {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]) {
                    float score;
                    if(!t[game->playmove(i)]) {
                        t[game->playmove(i)] = -1001;
                        gamestate *play = new gamestate(game->playmove(i), !game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        t[game->playmove(i)] = score;
                    } else if(t[game->playmove(i)] == -1001)
                        score = 0;
                    else
                        score = adddepth(t[game->playmove(i)], depth);
                    if (score > alpha) alpha = score;
                    if (alpha >= beta) break;
                }
            return alpha;
        } else {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]) {
                    float score;
                    if(!f[game->playmove(i)]) {
                        f[game->playmove(i)] = -1001;
                        gamestate *play = new gamestate(game->playmove(i), !game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        f[game->playmove(i)] = score;
                    } else if(f[game->playmove(i)] == -1001)
                        score = 0;
                    else
                        score = adddepth(f[game->playmove(i)], depth);
                    if (score < beta) beta = score;
                    if (alpha >= beta) break;
                }
            return beta;
        }
    }

这是增加深度的功能 -

float adddepth(float score, int depth) //Add depth to pre-calculated score.
{
    int olddepth;
    float newscore;
    if(score > 0) {
        olddepth = std::tan((1000 - score) * std::acos(-1) / 2000);
        depth += olddepth;
        newscore = 1000 - std::atan(depth) * 2000 / std::acos(-1);
    } else {
        olddepth = std::tan((1000 + score) * std::acos(-1) / 2000);
        depth += olddepth;
        newscore = std::atan(depth) * 2000 / std::acos(-1) - 1000;
    }
    return newscore;
}

score 1 · Accepted Answer

免责声明：我不懂 C++，~~坦率地说，我没有费心去阅读游戏规则~~。我现在已经阅读了规则，并且仍然坚持我所说的......但我仍然不知道 C++。不过，我可以介绍一些算法的一般知识，这些知识应该会让你朝着正确的方向前进。

不对称本身并不是一件坏事。如果两个招式完全等价，它应该选择其中之一，而不是像布里丹的屁股一样束手无策。事实上，您应该确保您编写的任何代理都有某种方法可以在它无法区分的策略之间任意选择。

您应该更仔细地考虑拒绝访问以前的状态所暗示的效用方案。追求一个无限循环是一个有效的策略，即使你当前对它的表示会使程序崩溃；也许错误是溢出，而不是导致它的策略。如果让你在输掉比赛和拒绝让比赛结束之间做出选择，你希望你的经纪人更喜欢哪一个？

无限播放

如果你希望你的代理不惜一切代价避免失败——也就是说，你希望它更喜欢无限期的游戏而不是失败——那么我建议将任何重复的状态视为最终状态，并为其分配一个介于输赢之间的值。毕竟，从某种意义上说，它是终局的——这是游戏将永远永远永远进入的循环，其确定的结果是没有赢家。但是，请记住，如果您使用的是简单的极小极大（一个效用函数，而不是两个），那么这意味着您的对手也将永恒游戏视为中等结果。

这听起来可能很荒谬，但也许玩到无穷大实际上是一个合理的策略。请记住，极小极大假设了最坏的情况——一个完全理性的敌人，其利益与您的利益完全相反。但是，例如，如果你正在编写一个代理来与一个人对战，那么这个人要么在逻辑上犯错，要么最终决定他们宁愿以失败告终——所以你的代理将受益于耐心地留在这个纳什平衡循环！

好吧，让我们结束游戏吧

如果您希望您的代理更喜欢游戏最终结束，那么我建议实施一个活罚——一个添加到您的效用的修饰符，它随着时间的推移而减少（无论是渐近的还是无限制的）。仔细实施，这可以保证最终，任何一端都比另一轮更可取。同样使用此解决方案，您需要谨慎考虑这对您的对手意味着什么偏好。

第三种方式

另一个常见的解决方案是限制搜索的深度并实现评估功能。这将游戏状态作为其输入，并且只是吐出一个效用值，这是对最终结果的最佳猜测。这可以证明是最优的吗？不，除非您的评估函数刚刚完成极小值，但这意味着您的算法将在合理的时间内完成。通过将这个粗略的估计埋在树中足够深的地方，你最终得到了一个非常合理的模型。但是，这会产生不完整的策略，这意味着它对重新计划代理比对标准计划代理更有用。Minimax 重新计划是复杂游戏的常用方法（如果我没记错的话，它是Deep Blue遵循的基本算法)，但由于这是一个非常简单的游戏，您可能不需要采用这种方法。

关于抽象的旁注

请注意，所有这些解决方案都被概念化为效用函数的数值变化或估计。一般来说，这比任意放弃可能的政策更可取。毕竟，这就是你的实用程序函数的用途——任何时候你根据实用程序的数值以外的任何东西做出策略决策，你就破坏了你的抽象并使你的代码变得不那么健壮。

c++ - 具有 alpha-beta 修剪问题的 Minimax

1 回答 1

无限播放

好吧，让我们结束游戏吧

第三种方式

关于抽象的旁注

Related

Reference