groovy - negaMax 算法产生一些奇怪的结果

Question

我目前正在实施一个跳棋游戏，唯一阻碍我的是我的人工智能状态不佳。它是用 Groovy 编写的。

我有以下（尝试的）带有 alpha、beta 修剪的 negaMax 算法。我已经遵循了几个伪指南，但我显然在某个地方失败了，因为结果相当荒谬。

该方法调用如下：negaMax(3, Integer.MIN_VALUE, Integer.MAX_VALUE, 1)

我已经决定 1 将是电脑播放器；其他任何东西都是用户。

def negaMax(int depth, int alpha, int beta, int player) {
    int score
    int bestScore = Integer.MIN_VALUE
    def moves = getMoves(player)                                        // this function returns a hashmap as I felt I needed not just the move but the checker
    // loop through all moves
    for (move in moves) {
        Position origin = move.key.location                             // save original position to allow undo
        move.key.location = move.value                                  // move piece
        if (depth == 0) {
            score = evaluateGameState(player)
        } else {
            score = -negaMax(depth - 1, -beta, -alpha, -player)         //  move score = - opponents best move
        }
        move.key.location = origin                                      // undo move
        if (player == 1) {                                              // save successor evaluations for the computer to search
            evaluations.put((move.key) , new PositionAndScore(score, move.value))
        }
        bestScore = Math.max(bestScore, score)
        alpha = Math.max(alpha, bestScore)
        if (alpha >= beta) {
            break                                                       // prune
        }
    }
    return bestScore
}

我选择了一个移动的哈希映射，键作为检查器（Piece 对象），值作为实际移动。我认为仅存储动作没有任何意义，因为我需要跟踪实际可以实现的动作。

我利用另一个哈希映射来存储后继评估，再次将检查器存储为键，但这次我存储了值的位置和位置分数（我为此创建了一个类 PositionAndScore）。

evaluateGameState 函数初始化该玩家可以移动多少棋子的分数，为任何国王加一分，为任何处于可取位置的棋子收回一分。

玩的时候，电脑做的前两个动作看起来很聪明，但从那以后，它就走下坡路了。很多时候，计算机试图做出无效的动作，因此它们不会执行。

我将非常感谢任何给我时间来查看我到目前为止所做的事情并评论是否有任何问题的人。

非常感谢。

编辑：好的，我取得了一些进展。正如我可能没有提到的，evaluations哈希图用于计算计算机的最佳移动。它从中获得最高分。

这导致的问题是，对于玩家为 1 的每个循环都添加了评估哈希图，因此添加了不合法的动作（但即它们是未来的动作）。

为了解决这个问题，我决定添加一个名为的前体方法callSearch()，而不是negaMax使用所有相同的参数来调用它，但是它也rootDepth将depth.

然后我对算法做了这个小改动

if (player == 1 && depth == rootDepth) {

}

我的想法是，一旦搜索回到根，我只想添加后继评估。

无论如何，完成所有这些操作后，计算机不再尝试进行非法移动，但它仍然无法做出有效的移动。这可能是我的评估功能，虽然有点初级。

groovy - negaMax 算法产生一些奇怪的结果

0 回答 0

Related

Reference