Reinforcement learning in complex environments may require supervision toprevent the agent from attempting dangerous actions . We present the Modified-Action MarkovDecision Process, an extension of the MDP model that allows actions to differ from the policy . We analyze the asymptotic behaviours of common reinforcementlearning algorithms in this setting and show that they adapt in different ways: Some completely ignore modifications while others go to various lengths intrying to avoid action modifications that decrease reward . By choosing theright algorithm, developers can prevent their agents from learning tocircumvent interruptions or constraints, and better control agent responses to other kinds of action modification, like self-damage .

Author(s) : Eric D. Langlois, Tom Everitt

Links : PDF - Abstract

Code :

https://github.com/mtrazzi/two-step-task


Coursera

Keywords : action - actions - agents - modified - agent -

Leave a Reply

Your email address will not be published. Required fields are marked *