我的梯度下降 SARSA 不断地以指数方式增加权重。在第 4 集第 17 步,值已经是 nan
Exception: Qa is nan
例如:
6) Qa:
Qa = -2.00890180632e+303
7) NEXT Qa:
Next Qa with west = -2.28577776413e+303
8) THETA:
1.78032402991e+303 <= -0.1 + (0.1 * -2.28577776413e+303) - -2.00890180632e+303
9) WEIGHTS (sample)
5.18266630725e+302 <= -1.58305782482e+301 + (0.3 * 1.78032402991e+303 * 1)
我不知道去哪里寻找我犯的错误。这是一些代码FWIW:
def getTheta(self, reward, Qa, QaNext):
""" let t = r + yQw(s',a') - Qw(s,a) """
theta = reward + (self.gamma * QaNext) - Qa
def updateWeights(self, Fsa, theta):
""" wi <- wi + alpha * theta * Fi(s,a) """
for i, w in enumerate(self.weights):
self.weights[i] += (self.alpha * theta * Fsa[i])
我有大约 183 个二进制特征。