Value iteration is a well-known method of solving Markov Decision Processes . However, the computational cost of value iteration quickly becomesfeasible as the size of the state space increases . In this paper, we propose an intuitive algorithm for solving MDPsthat reduces the cost of updates by dynamically grouping together states with similar cost-to-go values . We also prove that our algorithm converges almost surely to within \(2\varepsilon / (1 – \gamma) of the true optimal value in the \(\ell^\infty\ norm, where \(\gamma\) is thediscount factor and aggregated states differ by at most \(\vareptilon\). Numerical experiments on a variety of simulated environments confirm the strength of our algorithm
Author(s) : Guanting Chen, Johann Demetrio Gaebler, Matt Peng, Chunlin Sun, Yinyu YeLinks : PDF - Abstract
Code :
Keywords : algorithm - cost - iteration - state - solving -