flow-project - How is FLOW currently dealing with clipping actions, squash_to_range, no_final_linear

Question

I am confused about the way FLOW is meant to make sure the policy network output is matched to the action space limits.

In the version of rllib/ray I installed following the installation guidelines, I see two different ways of dealing with this:

1) squash_to_range option of model: this is available in the code, but:

"The squash_to_range option is deprecated. See the clip_actions agent option instead."

2) clip_actions: This is available, but the flow examples include the following line:

config['clip_actions'] = False  # FIXME(ev) temporary ray bug

Looking at the current version of rllib/ray, I see a new option:

3) no_final_linear

#Whether to skip the final linear layer used to resize the
#hidden layer outputs to size `num_outputs`. If True, then the last
#hidden layer should already match num_outputs.
"no_final_linear": False,

The example code provided within flow clips the fcnet output to the box limits of the action space. Debugging the raw network output I get, it contains values far outside the action space range.

Did I miss something when implementing my own experiments? I thought defining the Box for the action space was meant to be sufficient.

It seems like a bad idea to me to just clip the actions, I would rather prefer to use the no_final_linear option.

What is your take on this?

What are the implications of switching to the current version of ray? (mine is ray 0.6.1 as in the conda installation environement file)

What are your plans to fix the "temporary ray bug" issue?

Thanks for any hints

score 0 · Accepted Answer

this is a good point. In Ray 0.6.1 (the current version we use on Master), there was a bug where Ray's clip_actions did not properly update the results of the clipping into the loss. This is resolved in more recent versions of Ray. As you mentioned, as a consequence the network outputs are outside of the acceptable range; to deal with this we just implemented clipping within the step method of Flow's base_env.

One way to resolve this is to upgrade to the most recent version of Ray (though I've only tested compatibility up to Ray 0.7.2). You can definitely update to the newest version of Ray with relatively few issues, though you have to mildly amend the runner script to make it compatible. If you run into issues with upgrading the version of Ray, please ping us!

Our plan to fix it is to upgrade to the newest version of Ray for the release planned in late August (possibly earlier, but we don't want to make any promises we can't keep).

flow-project - How is FLOW currently dealing with clipping actions, squash_to_range, no_final_linear

1 回答 1

Related

Reference