I am confused about the way FLOW is meant to make sure the policy network output is matched to the action space limits.
In the version of rllib/ray I installed following the installation guidelines, I see two different ways of dealing with this:
1) squash_to_range option of model: this is available in the code, but:
"The squash_to_range option is deprecated. See the clip_actions agent option instead."
2) clip_actions: This is available, but the flow examples include the following line:
config['clip_actions'] = False # FIXME(ev) temporary ray bug
Looking at the current version of rllib/ray, I see a new option:
3) no_final_linear
#Whether to skip the final linear layer used to resize the
#hidden layer outputs to size `num_outputs`. If True, then the last
#hidden layer should already match num_outputs.
"no_final_linear": False,
The example code provided within flow clips the fcnet output to the box limits of the action space. Debugging the raw network output I get, it contains values far outside the action space range.
Did I miss something when implementing my own experiments? I thought defining the Box for the action space was meant to be sufficient.
It seems like a bad idea to me to just clip the actions, I would rather prefer to use the no_final_linear option.
What is your take on this?
What are the implications of switching to the current version of ray? (mine is ray 0.6.1 as in the conda installation environement file)
What are your plans to fix the "temporary ray bug" issue?
Thanks for any hints