Learning
1) Observe a current board state s;
2) Make a next move based on the distribution of all available V(s') of next moves. Strictly the choice is often based on Boltzman’s distribution of V(s'), but can be simplified to maximum-value move (greedy) or, with some probability epsilon, a random move as you are using;
3) Record s' in a sequence;
4) If the game finishes, it updates the values of the visited states in the sequence and starts over again; otherwise, go to 1).
Game Playing
1) Observe a current board state s;
2) Make a next move based on the distribution of all available V(s') of next moves;
3) Until the game is over and it starts over again; otherwise, go to 1).
Regarding your question, yes the look-up table in Game Playing phase is built up in the Learning phase. Every time the state is chosen from the all the V(s) with a maximum possible number of 3^9=19683. Here is a sample code written by Python that runs 10000 games in training.