algorithm - 动态规划卡牌游戏

Question

请检查我遇到的这个问题：

“你和你八岁的侄子埃尔莫决定玩一个简单的纸牌游戏。在游戏开始时，牌面朝上一长排。每张牌都值不同的点数。毕竟发牌后，你和 Elmo 轮流从行中取出最左边或最右边的牌，直到所有牌都用完。在每一轮，你可以决定拿两张牌中的哪一张。游戏的赢家是玩家游戏结束时获得最多分数的人。从未上过算法课的 Elmo 遵循明显的贪婪策略？轮到他的时候，Elmo 总是拿点值较高的牌。你的任务是找到策略只要有可能就会打败 Elmo。（像这样打败一个小孩似乎很卑鄙，但 Elmo 绝对讨厌大人让他赢。）

描述并分析一种算法，以确定在给定初始卡片序列的情况下，您可以在与 Elmo 比赛中获得的最大点数。”

我已经完成了这个问题的大部分理论工作。例如，我已经完成了 DP 所需的 optimus 子结构演示，并且我定义了递归低效形式，它解释了游戏是如何完成的。现在下一步是设计一个自下而上的算法来有效地解决这个问题，或者，如果可能的话，一个自上而下的记忆解决方案。我只是不能做任何一个。你将如何解决这个问题？

score 0 · Accepted Answer

The algorithm is simple and you can use memoization and dynamic programming in this way:

def findMax(mem, cards, myTurn):
    maxValue = 0
    if(not cards):
        return 0
    if str(cards) in mem: #If we have already compute this state
        return mem[str(cards)]
    elif not myTurn: #turn of Elmo
        if cards[0] > cards[len(cards) - 1]:
            maxValue = findMax(mem, cards[1:], True)
        else:
            maxValue = findMax(mem, cards[:-1], True)
    else: #your turn
        maxValue = max(cards[0] + findMax(mem, cards[1:], False), cards[len(cards) - 1] + findMax(mem, cards[:-1], False))
    mem[str(cards)] = maxValue  #Store the max value for this state
    return maxValue

import random
size = int(10 + random.randint(0,1))
cards = [random.randint(0,50) for x in range(size)]
print "Cards" + str(cards)
print findMax({}, cards, True)

Output:

Cards: [28, 33, 48, 0, 26, 1, 3, 11, 22, 32, 12]
Max value: 120

score 0 · Accepted Answer

这是一个允许您选择两个玩家的策略的解决方案。您可以使用它来解决给定的问题，但您也可以将两个玩家的策略设置为“optimal_strategy”以找到极小极大解决方案。

import random

def greedy_strategy(s0, s1, cards, i, j, cache):
    if i == j: return 0
    if cards[i] >= cards[j - 1]:
        return cards[i] - s1(s1, s0, cards, i + 1, j, cache)
    else:
        return cards[j - 1] - s1(s1, s0, cards, i, j - 1, cache)

def optimal_strategy(s0, s1, cards, i, j, cache):
    if i == j: return 0
    key = (i, j)
    if key not in cache:
        left = cards[i] - s1(s1, s0, cards, i + 1, j, cache)
        right = cards[j - 1] - s1(s1, s0, cards, i, j - 1, cache)
        cache[key] = max(left, right)
    return cache[key]

def score_play(cards, s0, s1):
    # How many points you'll win by
    adv = s0(s0, s1, cards, 0, len(cards), {})
    # my_score + opp_score = sum(cards)
    # my_score - opp_score = adv
    # adding: 2 * my_score = sum(cards) + adv
    # Therefore my_score is this...
    return (sum(cards) + adv) // 2

for _ in xrange(10):
    cards = range(20)
    random.shuffle(cards)
    print cards, score_play(cards, optimal_strategy, greedy_strategy)

algorithm - 动态规划卡牌游戏

2 回答 2

Related

Reference