We introduce an algorithm that efficiently learnspolicies in non-stationary environments . It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detectionstatistics . We show that (i) this algorithm minimizes the delay untilunforeseen changes to a context are detected, thereby allowing for rapidresponses . It bounds the rate of false alarm, which is important in order to minimize regret . Weevaluate our algorithm on high-dimensional continuous reinforcement learningproblems and show that it outperforms state-of-the-art (model-free andmodel-based) RL algorithms, as well as state- of theart meta-learning methodsspecially designed to deal withnon-stationarity . We also show that our algorithm outperforms our algorithm . We conclude that this algorithm is an effective and reliable approach to solving the problem of non-strictive non-Strictive Non-Strategic Non-Structible Non-Structural Learning Problems . Back to Mail Online Online: http://www.mailonlineonline.com/news/newsonline/

Author(s) : Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva

Links : PDF - Abstract

Code :
Coursera

Keywords : algorithm - high - learning - show - online -

Leave a Reply

Your email address will not be published. Required fields are marked *