algorithm - 硬币游戏的最优策略

Question

考虑一排 n 个价值为 v1,v2.......,vn 的硬币。我们通过交替轮流与对手进行游戏。在每一轮中，玩家从该行中选择第一枚硬币或最后一枚硬币，将其永久移除，并获得硬币的价值。确定如果我们先行动，我们绝对可以赢得的最大可能金额。

我的解决方案

由于我们先走，我们可以选择v1或v2，然后我们的对手可以从头开始选择，也可以从头开始选择。因此，可能出现的四个子问题是。

(1,1) , (1,2), (2,1), (2,2)

其中 (1,1) 表示，我从首发 [即 1] 中选择，而对手也从首发中选择 [即 1]。

(1,2) 表示我从一开始就选择，但对手选择最后一个。

因此，如果M(i,j)是我可以选择的 (i,j) 上的最大值，则将 (i,j) 表示为递归函数。

M(i,j) = Max{ Max{ M(i+2,j), M(i+1,j-1) } + vi, Max{ M(i+1,j-1), M(i,j-2) } + vj }

解释：当我有 i..j 个元素时，我可以选择第一个 [i+1] ，然后我的对手可以选择第一个 [i+2] 或最后一个 [j-1]，并且我希望下次选择时获得最大值，因此外部 Max 中的第一个术语。

第二个与上面类似，即如果我选择最后一个 [j-1]，对手选择第一个 [i+1] 或最后一个 [j-2]，我下次将其最大化。

现在，在书中我看到了递归

M(i,j) = Max{ Min{ Same } + vi, Min{ Same } + vj }

现在，我为什么要在这里最小化。这不等于说我第一次必须选择最大化，但第二次必须选择最小化。

我究竟做错了什么？谢谢。

score 4 · Accepted Answer

如果你想计算你绝对可以赢的钱，你必须假设你的对手试图最大化他/她自己的结果，这等于最小化你的结果（因为你的收益总和总是等于v1 + ... + vn）。你的公式计算的是如果你的对手总是做出错误的动作（从他/她的角度来看）你能赢什么。

score 4 · Accepted Answer

If the number n of coins is even, first player can guarantee himself the maximum of v1 + v3 + ... + v(n-1) and v2 + v4 + ... + vn. Figure out which is maximum, then take either v1 or vn. Thereafter, only take the coins that were originally in odd positions or even positions respectively. That is possible because no matter what second player does, a coin in both odd position and in even position are exposed when first player again moves.

Is that the best first player can do? First player can switch from "odds" to "evens" at any move, depending on whether the sum of the "new evens" is greater than or less than the sum of the "new odds". The strategy of the second player should then be to keep the incentive of switching as low as possible, which means picking the largest coin (first or last position). Might the resulting position still tempt first player to switch? Yes, as we see by the following game:

1 3 19 6 8 20

First player adds 1 + 19 + 8 = 28 and 3 + 6 + 20 = 29. He can guarantee a total score of at least 29 by picking evens, 20. The result is

1 3 19 6 8

To reduce the temptation of first player to switch from evens to odds and doing even better than a total score of 29, second player takes 8.

1 3 19 6

However, the temptation is still there. In fact, the sum of the odds position is 1 + 19 = 20, the sum of the evens only 3 + 6 = 9, so first player can do better now by switching to odds, and takes 1, even though 6 is greater.

  3 19 6

The best second player can do is to pick 6, first player gets 19, second player gets 3. Total score: first player 20 + 1 + 19 = 40, second player 8 + 6 + 3 = 17.

It seems that first player can always get more than second player in this manner. However, we still don't know whether that is the optimal strategy for first player.

On the other hand, if the number of coins n is odd, the roles of first player and second player in the above analysis are essentially reversed.

Now the question becomes, are the greedy strategies outlined above truly optimal? We can check in specific cases by examining the binary decision tree where first player takes either the first or last coin, then second player takes first or last coin of remaining, etc. There will be 2^n leaves in the tree; the score at each of those leaves can be calculated, then working back from leaves to root the value of each higher node can be calculated as either the max of the two children or min of the two children, depending on whose turn it is.

Instead of building the tree explicitly, it can be built implicitly by recursive function calls, which will save total memory required but will not save time.

If someone analyzes the game in this way and comes up with an optimal strategy different from mine, I would very much like to see it.

score 0 · Accepted Answer

http://www.geeksforgeeks.org/dynamic-programming-set-31-optimal-strategy-for-a-game/

This page gives a very clear explanation of the problem and also the solution. It concludes that:

F(i, j)  represents the maximum value the user can collect from 
         i'th coin to j'th coin.

    F(i, j)  = Max(Vi + min(F(i+2, j), F(i+1, j-1) ), 
                   Vj + min(F(i+1, j-1), F(i, j-2) )) 
Base Cases
    F(i, j)  = Vi           If j == i
    F(i, j)  = max(Vi, Vj)  If j == i+1

algorithm - 硬币游戏的最优策略

3 回答 3

Related

Reference