algorithm - 如何在 alpha-beta minimax 中准确使用“历史启发式”？

Question

我正在为国际象棋游戏制作人工智能。

到目前为止，我已经成功实现了 Alpha-Beta Pruning Minimax 算法，如下所示（来自 Wikipedia）：

(* Initial call *)
alphabeta(origin, depth, -∞, +∞, TRUE)

function alphabeta(node, depth, α, β, maximizingPlayer)
    if depth = 0 or node is a terminal node
        return the heuristic value of node
    if maximizingPlayer
        for each child of node
            α := max(α, alphabeta(child, depth - 1, α, β, FALSE))
            if β ≤ α
                break (* β cut-off *)
        return α
    else
        for each child of node
            β := min(β, alphabeta(child, depth - 1, α, β, TRUE))
            if β ≤ α
                break (* α cut-off *)
        return β

由于这花费了太多时间复杂性（一棵一棵地遍历所有树），我遇到了一种叫做“历史启发式”的东西。

原始论文中的算法：

int AlphaBeta(pos, d, alpha, beta) 
{ 
    if (d=0 || game is over) 
        return Eval (pos);  // evaluate leaf position from current player’s standpoint 

    score = - INFINITY;     // preset return value 
    moves = Generate(pos);  // generate successor moves 

    for i=1 to sizeof(moves) do                // rating all moves 
        rating[i] = HistoryTable[ moves[i] ]; 
    Sort( moves, rating );                     // sorting moves according to their history scores 

    for i =1 to sizeof(moves) do { // look over all moves 
        Make(moves[i]); // execute current move 
        cur = - AlphaBeta(pos, d-1, -beta, -alpha); //call other player

        if (cur > score) {
            score = cur; 
            bestMove = moves[i];      // update best move if necessary 
        } 

        if (score > alpha) alpha = score;    //adjust the search window 
            Undo(moves[i]);                  // retract current move 

        if (alpha >= beta) goto done;        // cut off 
     } 

     done: 
     // update history score 
     HistoryTable[bestMove] = HistoryTable[bestMove] + Weight(d); 

     return score; 
}

所以基本上，这个想法是跟踪以前的“移动”的哈希表或字典。

现在我很困惑这个“移动”在这里意味着什么。我不确定它是指单次移动还是每次移动后的整体状态。

例如，在国际象棋中，这个哈希表的“关键”应该是什么？

像（女王到位置（0,1））或（骑士到位置（5,5））这样的个人移动？
还是个别走法后棋盘的整体状态？

如果是1，我猜在将“移动”记录到我的历史表时没有考虑其他棋子的位置？

score 1 · Accepted Answer

我认为在线提供的原始论文（实践中的历史启发式和 Alpha-Beta 搜索增强，Jonathan Schaeffer）清楚地回答了这个问题。在论文中，作者将移动定义为棋盘上的 2 个索引（从正方形和到），使用 64x64 表（实际上，我认为他使用位移和单个索引数组）来包含移动历史。

作者比较了所有可用的移动排序方法，并确定 hh 是最好的。如果当前的最佳实践已经建立了移动排序的改进形式（超越 hh + 转置表），我也想知道它是什么。

score 0 · Accepted Answer

您可以使用转置表，以避免多次评估同一块板。换位意味着您可以通过以不同的顺序执行移动来达到相同的棋盘状态。天真的例子：

1. e4 e5 2. Nf3 Nc6
1. e4 Nc6 2. Nf3 e5

这些戏剧导致相同的位置，但到达的位置不同。

http://en.wikipedia.org/wiki/Transposition_table

一种常用的方法称为 Zobrist 散列来散列国际象棋位置：

http://en.wikipedia.org/wiki/Zobrist_hashing

score 0 · Accepted Answer

根据我的经验，与其他技术相比，历史启发式产生的好处可以忽略不计，并且不值得用于基本的搜索程序。这与使用转置表不同。如果后者是您想要实施的，我仍然建议不要这样做。还有许多其他技术可以以更少的努力产生良好的结果。事实上，有效且正确的转置表是国际象棋引擎中最难编码的部分之一。

首先尝试剪枝和移动排序启发式，其中大部分是一到几行代码。我在这篇文章中详细介绍了这些技术，它还提供了您可以预期的性能提升的估计值。

score 0 · Accepted Answer

例如，在国际象棋中，这个哈希表的“关键”应该是什么？

像（女王到位置（0,1））或（骑士到位置（5,5））这样的个人移动？

还是个别走法后棋盘的整体状态？

关键是个人移动，在将“移动”记录到历史表时不考虑其他棋子的位置。

历史表（也称为蝶板）的传统形式类似于：

score history_table[side_to_move][from_square][to_square];

例如，如果移动e2-e4产生截止，则元素：

history_table[white][e2][e4]

（以某种方式）递增（与移动的位置无关）。

在示例代码中，历史启发式使用这些计数器进行移动排序。其他启发式方法可以利用历史表（例如后期移动减少）。

考虑一下：

通常历史启发式不适用于不知道移动顺序的普通Alpha-Beta（在国际象棋中，只有“安静”的移动是通过历史启发式排序的）；
历史表有其他形式（经常使用 is history_table[piece][to_square]）。

algorithm - 如何在 alpha-beta minimax 中准确使用“历史启发式”？

4 回答 4

Related

Reference