python - How to build a DQN that outputs 1 discrete and 1 continuous value as a pair?

Question

I am building a DQN for an Open Gym environment. My observation space is only 1 discrete value but my actions are:

self.action_space = (Discrete(3), Box(-100, 100, (1,)))

ex: [1,56], [0,24], [2,-78]...

My current neural network is:

model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states)) # (1,)
model.add(Dense(24, activation='relu'))
model.add(Dense(2, activation='linear'))

(I copied it from a tutorial that only outputs 1 discrete value in the range [0,1]}

I understand that I need to change the last layer of my neural network but what would it be in my case?

My guess is that the last layer should have 3 binary outputs and 1 continuous output but I don't know if it is possible to have different natures of outputs within the same layer.

score 0 · Accepted Answer

正如您在评论中已经指出的那样，由于 DQN 的工作方式，DQN 与连续动作空间不兼容；-当是连续argmax of "a" for Q(s,a)的时候，不可能检查Q(s,a)所有的。aa

话虽如此，当将此应用于策略梯度方法（与连续动作空间兼容）时，您将在问题中遇到相同的问题，因为使用策略梯度您需要为您采取的每个动作提供概率。像这样的东西可以工作：

Actor（在这种情况下为神经网络）提供 3 个输出。
前 2 个输出是每个离散值的概率。
第三个输出是你的连续值。

取前两个输出的 softmax，它给你你的离散值，然后取第三个输出，它是连续的，这会给你你的行动。然后，您需要导出该动作的概率，该概率由所有输出的组合概率给出。

python - How to build a DQN that outputs 1 discrete and 1 continuous value as a pair?

1 回答 1

Related

Reference