The proposed method can be utilized to train agents in environments with fairly complex state and action spaces . The main challenge of using such a rewardfunction is the high sparsity of positive reward signals . To address this problem, we use a simple prediction-based exploration strategy (called CuriousExploration) along with a Return-based Memory Restoration (RMR) technique which tends to remember more valuable memories . The videopresenting the performance of our trained agent is available athttp://bit.ly/HFO_Binary_Reward. The proposed system can converge easily to the nearly optimal behaviour. However, the proposed method cannot be used in complex environments such as Half Field Offensedomain. It can be used to teach agents to learn and perform in complex soccer domain. It is available to download the latest version of this article to see more details of our proposed method at www.hFO_

Author(s) : Saeed Tafazzol, Erfan Fathi, Mahdi Rezaei, Ehsan Asali

Links : PDF - Abstract

Code :
Coursera

Keywords : proposed - based - method - complex - return -

Leave a Reply

Your email address will not be published. Required fields are marked *