The goal of inverse reinforcement learning (IRL) is to infer a reward function that explains the behavior of an agent performing a task . In many real-world scenarios, however, examples of truly optimal behavior are scarce, and it is desirable to effectively leverage sets ofdemonstrations of suboptimal or heterogeneous performance, which are easier to obtain . We propose an algorithm that learns a . reward function from such . such as adistribution over rewards collected during the . demonstrations . We show thatour method is capable of learning reward functions such that policies trained . to optimize them outperform the demonstrations used for fitting the reward functions . We also show that our method is . capable of . learning rewarding functions . to outperform policies trained to . optimize them and to optimise them . The algorithm fits a reward functions, modeled as a neural network, by essentially minimizing the Wasserstein distance betweenthe corresponding induced .

Author(s) : Luis Haug, Ivan Ovinnikon, Eugene Bykovets

Links : PDF - Abstract

Code :

Keywords : reward - functions - learning - algorithm - outperform -

Leave a Reply

Your email address will not be published. Required fields are marked *