machine-learning - 我应该使用强化学习将哪些内容保存到文件/数据库中？

Question

我正在尝试进入机器学习领域，并决定自己尝试一下。我写了一个小井字游戏。到目前为止，计算机使用随机动作与自己对战。

现在，我想通过编写一个代理来应用强化学习，该代理将根据它对董事会当前状态的了解进行探索或利用。

我不明白的部分是：代理使用什么来训练自己以适应当前状态？假设一个 RNG bot (o) 玩家这样做：

[..][..][..]

[..][x][o]

[..][..][..]

现在代理必须决定最好的移动应该是什么。训练有素的人会选择第 1、第 3、第 7 或第 9 名。它是否在数据库中查找到导致他获胜的类似状态？因为如果是这样，我认为我需要将每一个动作保存到数据库中，直到最终状态（赢/输/平局），这对于单场比赛来说会是相当多的数据吗？

如果我想错了，我想知道如何正确地做到这一点。

score 2 · Accepted Answer

Learning

1) Observe a current board state s;

2) Make a next move based on the distribution of all available V(s') of next moves. Strictly the choice is often based on Boltzman’s distribution of V(s'), but can be simplified to maximum-value move (greedy) or, with some probability epsilon, a random move as you are using;

3) Record s' in a sequence;

4) If the game finishes, it updates the values of the visited states in the sequence and starts over again; otherwise, go to 1).

Game Playing

1) Observe a current board state s;

2) Make a next move based on the distribution of all available V(s') of next moves;

3) Until the game is over and it starts over again; otherwise, go to 1).

Regarding your question, yes the look-up table in Game Playing phase is built up in the Learning phase. Every time the state is chosen from the all the V(s) with a maximum possible number of 3^9=19683. Here is a sample code written by Python that runs 10000 games in training.

machine-learning - 我应该使用强化学习将哪些内容保存到文件/数据库中？

1 回答 1

Related

Reference