How you approach this will depend greatly on the language you are using for your game, but in general terms there are many approaches, depending on if you want to use a lot of storage or want some delay. It would be helpful if you could give some thoughts as to what sacrifices you are willing to make.
But, it would seem the best approach may be to just save the input from the user, as was mentioned, and either store the positions of all the actors/sprites in the game at the same time, which is as simple as just saving direction, velocity and tile x,y, or, if everything can be deterministic then ignore the actors/sprites as you can get their information.
How non-deterministic your game is would also be useful to give a better suggestion.
If there is a great deal of dynamic motion, such as a crash derby, then you may want to save information each frame, as you should be updating the players/actors at a certain framerate.