我正在尝试使用 alpha beta 修剪实现一个抽象的 minimax 算法。极小极大部分效果很好,但是一旦我添加了 alpha beta 剪枝,IA 就开始表现得非常愚蠢,甚至会跳过明显的动作。我不确定发生了什么事。
这就是我的递归函数的样子:
- (id<MMGameMove>)getBestMove:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
id<MMGameMove> bestMove = nil;
NSArray *allMoves = [game allMoves];
for (id<MMGameMove> move in allMoves)
{
//Take the move and evaluate the game's score
id<MMGame> gameBoard = [game clone];
move.player = player;
[gameBoard saveMove:move];
self.count++;
if (depth == 0 || gameBoard.isOver)
{
move.rank = [gameBoard scoreForPlayer:self.playerId depth:depth];
}
else
{
MMPlayerSeed opponent = (player == self.playerId) ? self.opponentId : self.playerId;
move.rank = [self getBestMove:gameBoard player:opponent depth:depth-1 alpha:alpha beta:beta].rank;
}
//If the new move is better than our previous move, take it
BOOL minMove = (player == self.opponentId && move.rank <= beta);
BOOL maxMove = (player == self.playerId && move.rank >= alpha);
if (minMove || maxMove)
{
BOOL shouldPrune = NO;
if (minMove)
{
beta = move.rank;
if (alpha >= beta) {
shouldPrune = YES;
}
}
else if (maxMove)
{
alpha = move.rank;
if (alpha <= beta) {
shouldPrune = YES;
}
}
bestMove = move;
if (shouldPrune && depth < self.maxDepth) {
break;
}
}
}
return bestMove;
}
我最初的电话是这样的:
[self getBestMove:game player:self.playerId depth:self.maxDepth alpha:INT_MIN beta:INT_MAX];
据我了解,对于相同的游戏状态,alpha-beta 剪枝应该给我与没有它的 minimax 完全相同的动作,但对于这个实现,显然不是这种情况。
编辑 1
在建议的修改之后还有另一个错误,那就是我正在修剪根节点。我编辑了代码以反映正确的答案。在执行此操作并在使用和不使用 alpha-beta 修剪的情况下运行 minimax 之后,我现在可以看到两者都产生了相同的结果,而且我能够检查从 alpha beta 加法中获得的更好性能。
编辑 2
上面发布的代码实际上没有按预期工作。我遵循了 xXliolauXx 的建议,但仍然无法正常工作。我在 depth = 0 或游戏结束时得到了正确的值,但似乎它们没有递归地传递回相应的根移动。例如,我可以看到我的启发式方法对于第一个根移动的孩子返回 -3,而对于其余的孩子返回 0。所以我希望第一个根移动报告 -3 而不是 0,因为这是计算机在执行该移动时可能发现的最坏情况。
这是我的新代码:
- (NSInteger)alphabeta:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
if (depth == 0 || game.isOver)
{
return [game scoreForPlayer:self.playerId depth:depth];
}
MMPlayerSeed opponent = (player == self.playerId) ? self.opponentId : self.playerId;
for (id<MMGameMove> move in game.allMoves)
{
id<MMGame> gameCopy = [game clone];
move.player = player;
[gameCopy saveMove:move];
self.count++;
NSInteger score = [self alphabeta:gameCopy player:opponent depth:depth-1 alpha:alpha beta:beta];
if (player == self.playerId)
{
if (depth == self.maxDepth)
{
move.rank = @(score);
[self.rootMoves addObject:move];
}
alpha = MAX(alpha, score);
if (beta < alpha)
{
break;
}
}
else
{
beta = MIN(beta, score);
if (beta < alpha)
{
break;
}
}
}
return (player == self.playerId) ? alpha : beta;
}
请注意,当 beta < alpha 时,我会在最大化时进行修剪。否则,它将始终在扫描第一个根移动后进行修剪。
这就是我启动递归的方式:
[self alphabeta:game player:self.playerId depth:self.maxDepth alpha:-INFINITY beta:INFINITY];
编辑 3
我想我明白了。我没有返回 alpha 或 beta,而是返回最好(或最差)的分数。我需要清理我的代码以使其更具可读性,但现在看起来是这样的:
- (NSInteger)alphabeta:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
if (depth == 0 || game.isOver)
{
return [game scoreForPlayer:self.playerId depth:depth];
}
MMPlayerSeed opponent;
NSInteger bestScore;
if (player == self.playerId)
{
opponent = self.opponentId;
bestScore = -INFINITY;
}
else
{
opponent = self.playerId;
bestScore = INFINITY;
}
for (id<MMGameMove> move in game.allMoves)
{
id<MMGame> gameCopy = [game clone];
move.player = player;
[gameCopy saveMove:move];
self.count++;
NSInteger score = [self alphabeta:gameCopy player:opponent depth:depth-1 alpha:alpha beta:beta];
if (player == self.playerId)
{
bestScore = MAX(bestScore, score);
alpha = MAX(alpha, bestScore);
if (depth == self.maxDepth)
{
move.rank = @(score);
[self.rootMoves addObject:move];
}
if (beta < alpha)
{
break;
}
}
else
{
bestScore = MIN(bestScore, score);
beta = MIN(beta, bestScore);
if (beta < alpha)
{
break;
}
}
}
return bestScore;
}