我有一个使用 MCTS ( http://mcts.ai/code/python.html ) 的家庭作业,可以根据需要使用 MCTS 玩尽可能多的井字游戏。该任务的目标是训练一个决策树分类器,该分类器可以根据游戏的当前状态和玩游戏的玩家来预测要采取的最佳行动。数据标记为 1.0 或 2.0 或 0,具体取决于哪个玩家在井字游戏网格中标记了他选择的位置(0 表示没有玩家)。到目前为止,我设法以如下格式将数据保存到 CSV:
未命名:0 玩家 0 1 2 ... 6 7 8 best_move 获胜
0 0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 4 0
我的第一个也是主要问题是如何使用 scikit-learn 制作一个包含所有相等状态的决策树分类器,即根应该有 9 个可供第一个玩家使用的决策,然后有 8 个可供第二个玩家使用,依此类推(1.0播放器 1,播放器 2 为 2.0)。第二个相互关联的问题是我如何一遍又一遍地表示 0-8 (9) 间隔中的重复数据,以便在读取第 9 个间隔后,它将在下一场比赛中从根部重新开始。当然,最好将玩家 1 或玩家 2 相同的子状态组合在一起。
这是我的代码生成的树的 pdf 视图。下面是我用来训练决策树的代码。
def visualise_tree(trained_tree):
dot_data = tree.export_graphviz(trained_tree,out_file=None)
graph = graphviz.Source(dot_data)
graph.render("oxo")
def trainTree(read_csv):
clf = tree.DecisionTreeClassifier()
slice_training_data = read_csv[["player","0", "1", "2", "3", "4", "5", "6", "7", "8"]]
slice_prediction_data = read_csv[["best_move"]]
clf.fit(slice_training_data,slice_prediction_data)
visualise_tree(clf)
print(read_csv)
if __name__ == "__main__":
""" Play a single game to the end using UCT for both players.
"""
#df = pd.DataFrame(columns=["player", "0", "1", "2", "3", "4", "5", "6", "7", "8", "best_move","won"])
#for i in range(1):
# df = UCTPlayGame(df)
read_csv = pd.read_csv('10000games.csv')
trainTree(read_csv)
#df = df[["player", "0", "1", "2", "3", "4", "5", "6", "7", "8", "best_move","won"]]
#print(df)
#df.to_csv('10000games.csv')
这是数据的格式:
,player,0,1,2,3,4,5,6,7,8,best_move,won
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4,0
1,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0
2,1.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1,0
3,2.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,7,0
4,1.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,2.0,0.0,3,0
5,2.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,5,0
6,1.0,2.0,1.0,0.0,1.0,1.0,2.0,0.0,2.0,0.0,2,0
7,2.0,2.0,1.0,1.0,1.0,1.0,2.0,0.0,2.0,0.0,6,0
8,1.0,2.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,0.0,8,0
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
如您所见,进行了 9 次移动,然后数据集在新游戏中重复(从 0 开始)。当每个玩家轮流移动时,每个玩家的数据在 1.0 和 2.0 之间循环。我还根据要求为赢得比赛的一组动作添加了一个获胜列(但不确定如何使用它,所以我没有将它包含在预测数据中)。理想情况下,决策树应该按照描述合并所有开始游戏状态,并预测最佳移动应该是什么。