artificial-intelligence - 使用极小极大搜索不完全信息的纸牌游戏

Question

我想使用极小极大搜索（带有 alpha-beta 修剪），或者更确切地说是负极大搜索，让计算机程序玩纸牌游戏。

纸牌游戏实际上由 4 名玩家组成。因此，为了能够使用极小极大等，我将游戏简化为“我”对抗“他人”。在每一次“移动”之后，你都可以从游戏本身客观地读出当前状态的评价。当所有 4 名玩家都放置了卡片时，最高的玩家将赢得所有玩家 - 卡片的价值计算在内。

由于您不知道其他 3 名玩家之间的卡牌分布究竟如何，我认为您必须使用不属于您的卡牌模拟所有可能的分布（“世界”）。你有 12 张牌，其他 3 名玩家总共有 36 张牌。

所以我的方法是这个算法，其中player1 到 3 之间的数字表示程序可能需要为其寻找动作的三个计算机玩家。并-player代表对手，即所有其他三名球员。

private Card computerPickCard(GameState state, ArrayList<Card> cards) {
    int bestScore = Integer.MIN_VALUE;
    Card bestMove = null;
    int nCards = cards.size();
    for (int i = 0; i < nCards; i++) {
        if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card
            int score;
            GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state)
            score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE);
            if (score > bestScore) {
                bestScore = score;
                bestMove = cards.get(i);
            }
        }
    }
    // now bestMove is the card to place
}

private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) {
    ArrayList<Card> cards;
    if (player >= 1 && player <= 3) {
        cards = state.getCards(player);
    }
    else {
        if (player == -1) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(2));
            cards.addAll(state.getCards(3));
        }
        else if (player == -2) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(3));
        }
        else {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(2));
        }
    }
    if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached
        if (player >= 1 && player <= 3) {
            return state.getCurrentPoints(player); // player's points as a positive value (for self)
        }
        else {
            return -state.getCurrentPoints(-player); // player's points as a negative value (for others)
        }
    }
    else {
        int score;
        int nCards = cards.size();
        if (player > 0) { // make one move (it's player's turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // wenn Zug gültig ist
                    score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha);
                    if (score >= beta) {
                        return score;
                    }
                    if (score > alpha) {
                        alpha = score; // alpha acts like max
                    }
                }
            }
            return alpha;
        }
        else { // make three moves (it's the others' turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // if move is valid
                    for (int k = 0; k < nCards; k++) {
                        if (k != i) {
                            GameState futureStateLevel2 = futureState.testMove(cards.get(k));
                            if (futureStateLevel2 != null) { // if move is valid
                                for (int m = 0; m < nCards; m++) {
                                    if (m != i && m != k) {
                                        GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m));
                                        if (futureStateLevel3 != null) { // if move is valid
                                            score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha);
                                            if (score >= beta) {
                                                return score;
                                            }
                                            if (score > alpha) {
                                                alpha = score; // alpha acts like max
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
            return alpha;
        }
    }
}

这似乎工作正常，但是对于 1 ( depthLeft=1) 的深度，程序已经需要平均计算 50,000 次移动（放置的牌）。当然，这太过分了！

所以我的问题是：

实施是否正确？你能模拟这样的游戏吗？关于不完善的信息，尤其是？
如何在速度和工作量方面改进算法？
例如，我可以将可能的移动集减少到 50% 的随机集以提高速度，同时保持良好的结果吗？
我发现UCT 算法是一个很好的解决方案（也许）。你知道这个算法吗？你能帮我实现它吗？

score 10 · Accepted Answer

我想澄清接受的答案并没有真正涉及的细节。

在许多纸牌游戏中，您可以对对手可能拥有的未知牌进行抽样，而不是全部生成。在进行此采样时，您可以考虑短花色等信息以及到目前为止持有某些牌的概率，以加权每个可能手牌的可能性（每手牌都是我们将独立解决的可能世界）。然后，您使用完美的信息搜索解决每一手牌。在所有这些世界中最好的举动通常是总体上最好的举动 - 有一些警告。

在像扑克这样的游戏中，这不会很好 - 游戏都是关于隐藏信息的。你必须精确地平衡你的行动，以隐藏你的手的信息。

但是，在诸如基于技巧的纸牌游戏之类的游戏中，这非常有效——尤其是因为新信息一直在被披露。真正优秀的玩家很清楚每个人都持有什么。因此，相当强大的 Skat 和 Bridge 程序都是基于这些想法。

如果你能彻底解决底层世界，那是最好的，但如果你不能，你可以使用 minimax 或 UCT 来选择每个世界中的最佳移动。还有一些混合算法（ISMCTS）试图将这个过程混合在一起。请注意此处的声明。简单的采样方法更容易编码——您应该在更复杂的方法之前尝试更简单的方法。

以下是一些研究论文，它们将提供更多关于不完美信息的抽样方法何时运作良好的信息：

了解完美信息蒙特卡洛抽样在博弈树搜索中的成功（本文分析抽样方法何时可能起作用。）

改进基于技巧的纸牌游戏中的状态评估、推理和搜索（本文描述了在 Skat 中采样的使用）

具有计算挑战性的游戏中的不完全信息（本文描述了 Bridge 中的采样）

信息集蒙特卡洛树搜索（本文将采样和UCT/蒙特卡洛树搜索合并，以避免第一参考中的问题。）

公认答案中基于规则的方法的问题在于，它们无法利用超出创建初始规则所需的计算资源。此外，基于规则的方法将受到您可以编写的规则的能力的限制。基于搜索的方法可以利用组合搜索的力量产生比程序作者更强大的游戏。

score 5 · Accepted Answer

您实施的 Minimax 搜索对于存在如此多不确定性的游戏来说是错误的方法。由于您不知道其他玩家之间的卡片分布情况，因此您的搜索将花费成倍的时间来探索根据卡片的实际分布情况不可能发生的游戏。

我认为更好的方法是在您对其他玩家的牌知之甚少或根本没有信息的情况下，从良好的游戏规则开始。像：

如果你在一轮中先出牌，则打出最低的牌，因为你赢得这一轮的机会很小。
如果您在一轮中打到最后，则打出您将赢得该轮的最低牌。如果你不能赢得这一轮，那就打出你最低的牌。

让你的程序一开始不用搜索，只需按照这些规则进行游戏，并假设所有其他玩家也会使用这些启发式方法。 当程序观察每轮的第一个和最后一个玩家打出的牌时，它可以建立一个关于每个玩家可能持有的牌的信息表。例如，9 会赢得这一轮，但玩家 3 没有玩它，所以他不能有任何 9 或更高的牌。随着关于每个玩家手牌的信息的收集，搜索空间最终将被限制到一个点，即对可能游戏的极小极大搜索可以产生关于下一张要玩的牌的有用信息。

artificial-intelligence - 使用极小极大搜索不完全信息的纸牌游戏

2 回答 2

Related

Reference