我想使用极小极大搜索(带有 alpha-beta 修剪),或者更确切地说是负极大搜索,让计算机程序玩纸牌游戏。
纸牌游戏实际上由 4 名玩家组成。因此,为了能够使用极小极大等,我将游戏简化为“我”对抗“他人”。在每一次“移动”之后,你都可以从游戏本身客观地读出当前状态的评价。当所有 4 名玩家都放置了卡片时,最高的玩家将赢得所有玩家 - 卡片的价值计算在内。
由于您不知道其他 3 名玩家之间的卡牌分布究竟如何,我认为您必须使用不属于您的卡牌模拟所有可能的分布(“世界”)。你有 12 张牌,其他 3 名玩家总共有 36 张牌。
所以我的方法是这个算法,其中player
1 到 3 之间的数字表示程序可能需要为其寻找动作的三个计算机玩家。并-player
代表对手,即所有其他三名球员。
private Card computerPickCard(GameState state, ArrayList<Card> cards) {
int bestScore = Integer.MIN_VALUE;
Card bestMove = null;
int nCards = cards.size();
for (int i = 0; i < nCards; i++) {
if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card
int score;
GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state)
score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE);
if (score > bestScore) {
bestScore = score;
bestMove = cards.get(i);
}
}
}
// now bestMove is the card to place
}
private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) {
ArrayList<Card> cards;
if (player >= 1 && player <= 3) {
cards = state.getCards(player);
}
else {
if (player == -1) {
cards = state.getCards(0);
cards.addAll(state.getCards(2));
cards.addAll(state.getCards(3));
}
else if (player == -2) {
cards = state.getCards(0);
cards.addAll(state.getCards(1));
cards.addAll(state.getCards(3));
}
else {
cards = state.getCards(0);
cards.addAll(state.getCards(1));
cards.addAll(state.getCards(2));
}
}
if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached
if (player >= 1 && player <= 3) {
return state.getCurrentPoints(player); // player's points as a positive value (for self)
}
else {
return -state.getCurrentPoints(-player); // player's points as a negative value (for others)
}
}
else {
int score;
int nCards = cards.size();
if (player > 0) { // make one move (it's player's turn)
for (int i = 0; i < nCards; i++) {
GameState futureState = state.testMove(cards.get(i));
if (futureState != null) { // wenn Zug gültig ist
score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha);
if (score >= beta) {
return score;
}
if (score > alpha) {
alpha = score; // alpha acts like max
}
}
}
return alpha;
}
else { // make three moves (it's the others' turn)
for (int i = 0; i < nCards; i++) {
GameState futureState = state.testMove(cards.get(i));
if (futureState != null) { // if move is valid
for (int k = 0; k < nCards; k++) {
if (k != i) {
GameState futureStateLevel2 = futureState.testMove(cards.get(k));
if (futureStateLevel2 != null) { // if move is valid
for (int m = 0; m < nCards; m++) {
if (m != i && m != k) {
GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m));
if (futureStateLevel3 != null) { // if move is valid
score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha);
if (score >= beta) {
return score;
}
if (score > alpha) {
alpha = score; // alpha acts like max
}
}
}
}
}
}
}
}
}
return alpha;
}
}
}
这似乎工作正常,但是对于 1 ( depthLeft=1
) 的深度,程序已经需要平均计算 50,000 次移动(放置的牌)。当然,这太过分了!
所以我的问题是:
- 实施是否正确?你能模拟这样的游戏吗?关于不完善的信息,尤其是?
- 如何在速度和工作量方面改进算法?
- 例如,我可以将可能的移动集减少到 50% 的随机集以提高速度,同时保持良好的结果吗?
- 我发现UCT 算法是一个很好的解决方案(也许)。你知道这个算法吗?你能帮我实现它吗?