1

我正在创建一个最小/最大的井字游戏,因此我可以将其扩展到 alpha-beta 修剪。因此,在我的最小/最大期间,我发现一条路径是否导致 +1(X 获胜)-1(O 获胜)或 0(平局)但是对于这样的板配置:

在 0 回合期间,它选择左下角,因为该动作导致其获胜。我是否应该检查每个表的块,然后它不会运行得那么快,我不认为应该如何实现 min/max。

0|x|0
-|x|-
-|-|-

有人可以解释为什么最小值/最大值不够聪明,无法检测到这一点。我认为它查看了左侧节点并返回 +1/-1/0。

4

2 回答 2

2

Edit: I've been mixing up "pure" minimax, with minimax + heuristic. I've edited my answer to resolve this.

Maybe it would help to define minmax. From An article by a UC Berkeley student:

minimax(player,board)
    if(game over in current board position)
        return winner
    children = all legal moves for player from this board
    if(max's turn)
        return maximal score of calling minimax on all the children
    else (min's turn)
        return minimal score of calling minimax on all the children

With minimax, you are trying to minimize your losses, not maximize your gains. So, "your" turn is min's turn. With this definition, if you could ever lose by selecting a square, then it will be marked -1. If you could ever tie, but will never lose, it will be marked 0. Only if it is a guaranteed win will it be marked 1.

Should I check each table for a block

If you are defining your score and algorithm correctly (matching the right players to the right logic), you need not "check for a block". Any game sub-tree where the player didn't block should implicitly be evaluated -1, because at some point (probably very quickly) it will evaluate to a loss, and that loss will bubble up.

The real problem with this algorithm (and where you may be getting results that you aren't expecting) is when all sub-trees result in possible losses. At that point, you will need to use a heuristic to get any better information on which move you should take. You will need something better than simply {-1, 0, 1}, because some moves could allow you to win, but you'd block them out because you could also lose.

于 2011-07-04T19:55:00.607 回答
0

我不太确定你的问题。如前所述,当多条路径导致胜利或所有路径导致失败时,最小/最大会出现问题。在这种情况下,选择任何或获胜的路径或任何路径来弥补损失在数学上都是正确的。但是,如果与不完美的对手一起玩,通常更明智的是选择最短的获胜路径和最长的失败路径(希望对手没有完美的选择并选择错误的选择)。

这种行为很容易在 min/max 中使用每次递归的衰减来实现。即,每当您从递归调用返回某些东西时,将结果乘以 0.9 或类似的东西。这将导致更长的负路径得分更高,而更长的正路径得分更低。

然而,一旦你开始使用启发式突破,这确实会导致问题。

于 2011-07-05T11:21:35.063 回答