我正在从 Q-learning Perspective研究GridWorld 。我对以下问题有疑问:
1) In the grid-world example, rewards are positive for goals, negative
for running into the edge of the world, and zero the rest of the time.
Are the signs of these rewards important, or only the intervals
between them?