c - 用C语言获得计算机决策速度的方法

Question

我正在尝试找出用计算机玩 Gomoku（5 x 5 版本的 tictactoe）的算法。在这种情况下，我发现最常用的算法是 Min-max（或 Alpha-beta），但这些算法对我来说太难处理了。所以我决定使用以下代码，这些代码很容易理解但很耗时。它显示了计算机如何做出合理的选择。

//------------------------------------------------------------
// computer_move() checks for all legal moves in the current  |
// position. then for each of them it calls the dfs_search()  |
// function to get the moves' score. And finally it returns   |
// the move with the best score.                              |
//------------------------------------------------------------

int computer_move()  //
{
    int best_move;  // best move so far
    int best_score = -100;  // best score so far 
    int score;  // current score
    int i;

    for (i = 0; i < 16; ++i) { //
        if (pos[i] == EMPTY) {  // if a legal move can be made 
            pos[i] = COMPUTER;  // mark the move
            score = -dfs_search(HUMAN); // 
            pos[i] = EMPTY; // take back the move

            if (score > best_score) {
                best_score = score;
                best_move = i;
            }
        }
    }

    printf("Computer's move: %d\n", best_move);
    return best_move;   // return the best move found
}


//------------------------------------------------------------
// dfs_search() gets the side to move, find all legal moves   |
// for him, and for each move, it recursively calls itself.   |
// It returns a score for the position.                       |
// This recursive function continues on each variation until  |
// the game's end is found in that variation. Which means     |
// that the variation continues until check_result() returns  |
// a value other than CONTINUE.                                   |
// Note that this can be done in tic-tac-toe, since it's a    |
// deterministic game. For games like chess or checkers we    |
// can't continue the variation until reaching the game's end |
// so we have to cut the variation at some point.             |
//------------------------------------------------------------

int dfs_search(int player) // 
{
    int best_score = -100;
    int score;
    int result;
    int i;

    result = check_result(player);
    if (result != CONTINUE) return result;  // return the result

    for (i = 0; i < 16; ++i) {
        if (pos[i] == EMPTY) {
            pos[i] = player;
            score = -dfs_search(CHANGE_PLAYER(player)); // 
            pos[i] = EMPTY;

            if (score > best_score)
                best_score = score;
        }
    }

    return best_score;  // return the best score
}

对于 3 x 3 矩阵，它工作得很好。然而，对于 4 x 4，留下下一块石头需要很长时间。由于耗时长的原因是前三四个决定，我认为只让计算机只在人类最后一个选择（点）周围搜索最佳点将是一个解决方案。在前三四个决定之后，上面的正式算法将适用于剩下的几个点。你怎么看？并给出一些修改当前算法的建议。

score 0 · Accepted Answer

您正在尝试解决整个游戏树。在 3x3 板上有 9 个！= 树中有 362880 个节点，这对于您的计算机来说足够小，但是在 4x4 板上有 16 个！= 20922789888000（20.9 万亿）个节点，在合理的时间内访问太多。

考虑实施一种搜索算法，该算法可以返回对当前位置得分的合理估计，而无需求解整个博弈树。对于 GoMoku，我推荐Monte Carlo Tree Search。它已成功应用于围棋等许多游戏，并且在其原始形式中，它不需要您编写固定深度 min-max 搜索及其变体（如 alpha-beta）所需的静态评估函数。

c - 用C语言获得计算机决策速度的方法

1 回答 1

Related

Reference