Value iteration is a well-known method of solving Markov Decision Processes . However, the computational cost of value iteration quickly becomesfeasible as the size of the state space increases . In this paper, we propose an intuitive algorithm for solving MDPsthat reduces the cost of updates by dynamically grouping together states with similar cost-to-go values . We also prove that our algorithm converges almost surely to within \(2\varepsilon / (1 – \gamma) of the true optimal value in the \(\ell^\infty\ norm, where \(\gamma\) is thediscount factor and aggregated states differ by at most \(\vareptilon\). Numerical experiments on a variety of simulated environments confirm the strength of our algorithm

Author(s) : Guanting Chen, Johann Demetrio Gaebler, Matt Peng, Chunlin Sun, Yinyu Ye

Links : PDF - Abstract

Code :
Coursera

Keywords : algorithm - cost - iteration - state - solving -

Leave a Reply

Your email address will not be published. Required fields are marked *