python - DQN 中的优先重播缓冲区集成似乎是错误的

问问题 2021-10-06T16:08:40.510

49 次

我尝试更新keras-rl DQN 算法以使用优先级重播缓冲区。请看这段代码。

在片段代码中，我更新了我的 DQNAgent 的反向传播以使用优先级重播缓冲区，但我似乎在使用权重和更新优先级方面我的实现是错误的。

我认为下面的代码片段有问题

metrics = self.trainable_model.train_on_batch(ins, [y_true_a, y_true_b, mask_a, mask_b], sample_weight=weights)

# Update priorities in PER
td_errors = np.sum([np.abs(q_batch - targets[range(self.batch_size), actions]) for q_batch, targets, actions in zip(all_q_batch, all_targets, all_actions)])
self.memory.update_priorities(batch_idxes, td_errors + self.prioritized_replay_eps)

在上面的片段中，我使用的是来自 openai 基线的优先重播缓冲区，您可以从以下链接找到它的实现和与 dqn 的集成：

https://github.com/openai/baselines/blob/master/baselines/deepq/replay_buffer.py
https://github.com/openai/baselines/blob/master/baselines/deepq/deepq.py#L292,L303
https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/deepq/build_graph.py#L317,L449

在上面的代码中，我使用 sample_weight 来设置由 Prioritized Replay Buffer 和 td_errors 给出的权重来更新优先级。如果错误，请帮助我更正我的代码。

python - DQN 中的优先重播缓冲区集成似乎是错误的

0 回答 0

Related

Reference