我正在使用 openai gym 编写的自定义环境中训练 DDPG 代理。我在训练模型时遇到错误。
当我在网上搜索解决方案时,我发现一些遇到类似问题的人能够通过初始化变量来解决它。
For example by using:
tf.global_variable_initialzer()
但我使用的是没有这种方法的 tensorflow 2.5.0 版。这意味着应该有其他方法来解决这个错误。但我无法找到解决方案。
这是我与那里版本一起使用的库
tensorflow: 2.5.0
gym: 0.18.3
numpy: 1.19.5
keras: 2.4.3
keras-rl2: 1.0.5 DDPG agent comes from this library
错误/堆栈跟踪:
Training for 1000 steps ...
Interval 1 (0 steps performed)
17/10000 [..............................] - ETA: 1:04 - reward: 256251545.0121
C:\Users\vchou\anaconda3\envs\AdSpendProblem\lib\site-packages\keras\engine\training.py:2401: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
100/10000 [..............................] - ETA: 1:03 - reward: 272267266.5754
C:\Users\vchou\anaconda3\envs\AdSpendProblem\lib\site-packages\tensorflow\python\keras\engine\training.py:2426: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
---------------------------------------------------------------------------
FailedPreconditionError Traceback (most recent call last)
<ipython-input-17-0938aa6056e8> in <module>
1 # Training
----> 2 ddpgAgent.fit(env, 1000, verbose=1, nb_max_episode_steps = 100)
~\anaconda3\envs\AdSpendProblem\lib\site-packages\rl\core.py in fit(self, env, nb_steps, action_repetition, callbacks, verbose, visualize, nb_max_start_steps, start_step_policy, log_interval, nb_max_episode_steps)
191 # Force a terminal state.
192 done = True
--> 193 metrics = self.backward(reward, terminal=done)
194 episode_reward += reward
195
~\anaconda3\envs\AdSpendProblem\lib\site-packages\rl\agents\ddpg.py in backward(self, reward, terminal)
279 state0_batch_with_action = [state0_batch]
280 state0_batch_with_action.insert(self.critic_action_input_idx, action_batch)
--> 281 metrics = self.critic.train_on_batch(state0_batch_with_action, targets)
282 if self.processor is not None:
283 metrics += self.processor.metrics
~\anaconda3\envs\AdSpendProblem\lib\site-packages\keras\engine\training_v1.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
1075 self._update_sample_weight_modes(sample_weights=sample_weights)
1076 self._make_train_function()
-> 1077 outputs = self.train_function(ins) # pylint: disable=not-callable
1078
1079 if reset_metrics:
~\anaconda3\envs\AdSpendProblem\lib\site-packages\keras\backend.py in __call__(self, inputs)
4017 self._make_callable(feed_arrays, feed_symbols, symbol_vals, session)
4018
-> 4019 fetched = self._callable_fn(*array_vals,
4020 run_metadata=self.run_metadata)
4021 self._call_fetch_callbacks(fetched[-len(self._fetches):])
~\anaconda3\envs\AdSpendProblem\lib\site-packages\tensorflow\python\client\session.py in __call__(self, *args, **kwargs)
1478 try:
1479 run_metadata_ptr = tf_session.TF_NewBuffer() if run_metadata else None
-> 1480 ret = tf_session.TF_SessionRunCallable(self._session._session,
1481 self._handle, args,
1482 run_metadata_ptr)
FailedPreconditionError: Could not find variable dense_5_1/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Resource localhost/dense_5_1/kernel/class tensorflow::Var does not exist.
[[{{node ReadVariableOp_21}}]]
演员和评论家网络如下:
ACTOR NETWORK
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 10) 0
_________________________________________________________________
dense (Dense) (None, 32) 352
_________________________________________________________________
activation (Activation) (None, 32) 0
_________________________________________________________________
dense_1 (Dense) (None, 32) 1056
_________________________________________________________________
activation_1 (Activation) (None, 32) 0
_________________________________________________________________
dense_2 (Dense) (None, 32) 1056
_________________________________________________________________
activation_2 (Activation) (None, 32) 0
_________________________________________________________________
dense_3 (Dense) (None, 10) 330
_________________________________________________________________
activation_3 (Activation) (None, 10) 0
=================================================================
Total params: 2,794
Trainable params: 2,794
Non-trainable params: 0
_________________________________________________________________
None
CRITIC NETWORK
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
observation_input (InputLayer) [(None, 1, 10)] 0
__________________________________________________________________________________________________
action_input (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 10) 0 observation_input[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 20) 0 action_input[0][0]
flatten_1[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 32) 672 concatenate[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 32) 0 dense_4[0][0]
__________________________________________________________________________________________________
dense_5 (Dense) (None, 32) 1056 activation_4[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 32) 0 dense_5[0][0]
__________________________________________________________________________________________________
dense_6 (Dense) (None, 32) 1056 activation_5[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 32) 0 dense_6[0][0]
__________________________________________________________________________________________________
dense_7 (Dense) (None, 1) 33 activation_6[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 1) 0 dense_7[0][0]
==================================================================================================
Total params: 2,817
Trainable params: 2,817
Non-trainable params: 0
__________________________________________________________________________________________________
None
这是DDPG代理的代码
# Create DDPG agent
ddpgAgent = DDPGAgent(
nb_actions = nb_actions,
actor = actor,
critic = critic,
critic_action_input = action_input,
memory = memory,
nb_steps_warmup_critic = 100,
nb_steps_warmup_actor = 100,
random_process = random_process,
gamma = 0.99,
target_model_update = 1e-3
)
ddpgAgent.compile(Adam(learning_rate=0.001, clipnorm=1.0), metrics=['mae'])