artificial-intelligence - 带有 alpha-beta 修剪的 Minimax 会产生错误的结果

Question

我正在尝试使用 alpha beta 修剪实现一个抽象的 minimax 算法。极小极大部分效果很好，但是一旦我添加了 alpha beta 剪枝，IA 就开始表现得非常愚蠢，甚至会跳过明显的动作。我不确定发生了什么事。

这就是我的递归函数的样子：

- (id<MMGameMove>)getBestMove:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
    id<MMGameMove> bestMove = nil;
    NSArray *allMoves = [game allMoves];

    for (id<MMGameMove> move in allMoves)
    {
        //Take the move and evaluate the game's score
        id<MMGame> gameBoard = [game clone];
        move.player = player;
        [gameBoard saveMove:move];
        self.count++;

        if (depth == 0 || gameBoard.isOver)
        {
            move.rank = [gameBoard scoreForPlayer:self.playerId depth:depth];
        }
        else
        {
            MMPlayerSeed opponent = (player == self.playerId) ? self.opponentId : self.playerId;
            move.rank = [self getBestMove:gameBoard player:opponent depth:depth-1 alpha:alpha beta:beta].rank;
        }

        //If the new move is better than our previous move, take it
        BOOL minMove = (player == self.opponentId && move.rank <= beta);
        BOOL maxMove = (player == self.playerId && move.rank >= alpha);

        if (minMove || maxMove)
        {
            BOOL shouldPrune = NO;
            if (minMove)
            {
                beta = move.rank;
                if (alpha >= beta) {
                    shouldPrune = YES;
                }
            }
            else if (maxMove)
            {
                alpha = move.rank;
                if (alpha <= beta) {
                    shouldPrune = YES;
                }
            }

            bestMove = move;

            if (shouldPrune && depth < self.maxDepth) {
                break;
            }
        }
    }

    return bestMove;
}

我最初的电话是这样的：

[self getBestMove:game player:self.playerId depth:self.maxDepth alpha:INT_MIN beta:INT_MAX];

据我了解，对于相同的游戏状态，alpha-beta 剪枝应该给我与没有它的 minimax 完全相同的动作，但对于这个实现，显然不是这种情况。

编辑 1

在建议的修改之后还有另一个错误，那就是我正在修剪根节点。我编辑了代码以反映正确的答案。在执行此操作并在使用和不使用 alpha-beta 修剪的情况下运行 minimax 之后，我现在可以看到两者都产生了相同的结果，而且我能够检查从 alpha beta 加法中获得的更好性能。

编辑 2

上面发布的代码实际上没有按预期工作。我遵循了 xXliolauXx 的建议，但仍然无法正常工作。我在 depth = 0 或游戏结束时得到了正确的值，但似乎它们没有递归地传递回相应的根移动。例如，我可以看到我的启发式方法对于第一个根移动的孩子返回 -3，而对于其余的孩子返回 0。所以我希望第一个根移动报告 -3 而不是 0，因为这是计算机在执行该移动时可能发现的最坏情况。

这是我的新代码：

- (NSInteger)alphabeta:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
    if (depth == 0 || game.isOver)
    {
        return [game scoreForPlayer:self.playerId depth:depth];
    }

    MMPlayerSeed opponent = (player == self.playerId) ? self.opponentId : self.playerId;

    for (id<MMGameMove> move in game.allMoves)
    {
        id<MMGame> gameCopy = [game clone];
        move.player = player;
        [gameCopy saveMove:move];
        self.count++;

        NSInteger score = [self alphabeta:gameCopy player:opponent depth:depth-1 alpha:alpha beta:beta];

        if (player == self.playerId)
        {
            if (depth == self.maxDepth)
            {
                move.rank = @(score);
                [self.rootMoves addObject:move];
            }

            alpha = MAX(alpha, score);

            if (beta < alpha)
            {
                break;
            }
        }
        else
        {
            beta = MIN(beta, score);

            if (beta < alpha)
            {
                break;
            }
        }
    }

    return (player == self.playerId) ? alpha : beta;
}

请注意，当 beta < alpha 时，我会在最大化时进行修剪。否则，它将始终在扫描第一个根移动后进行修剪。

这就是我启动递归的方式：

[self alphabeta:game player:self.playerId depth:self.maxDepth alpha:-INFINITY beta:INFINITY];

编辑 3

我想我明白了。我没有返回 alpha 或 beta，而是返回最好（或最差）的分数。我需要清理我的代码以使其更具可读性，但现在看起来是这样的：

- (NSInteger)alphabeta:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
    if (depth == 0 || game.isOver)
    {
        return [game scoreForPlayer:self.playerId depth:depth];
    }

    MMPlayerSeed opponent;
    NSInteger bestScore;

    if (player == self.playerId)
    {
        opponent = self.opponentId;
        bestScore = -INFINITY;
    }
    else
    {
        opponent = self.playerId;
        bestScore = INFINITY;
    }

    for (id<MMGameMove> move in game.allMoves)
    {
        id<MMGame> gameCopy = [game clone];
        move.player = player;
        [gameCopy saveMove:move];
        self.count++;

        NSInteger score = [self alphabeta:gameCopy player:opponent depth:depth-1 alpha:alpha beta:beta];

        if (player == self.playerId)
        {
            bestScore = MAX(bestScore, score);
            alpha = MAX(alpha, bestScore);

            if (depth == self.maxDepth)
            {
                move.rank = @(score);
                [self.rootMoves addObject:move];
            }

            if (beta < alpha)
            {
                break;
            }
        }
        else
        {
            bestScore = MIN(bestScore, score);
            beta = MIN(beta, bestScore);

            if (beta < alpha)
            {
                break;
            }
        }
    }

    return bestScore;
}

score -1 · Accepted Answer

错误似乎出在您的 pruning-possible-part 中（它是 negamax-alpha beta 的实现，而您使用的是 minimax-alphabeta）。

要修复它，只需添加一个 if，无论您当前是最大化还是最小化（就像您在更改 alpha 或 beta 时所做的那样）。

当你最小化时，只要 alpha >= beta，你就会修剪，反之亦然。

之后，代码应该可以正常工作（如果没有其他错误；））。

artificial-intelligence - 带有 alpha-beta 修剪的 Minimax 会产生错误的结果

1 回答 1

Related

Reference