The finite-time convergence of off-policy TD learning has been studied recently, but such a type of convergence has not been well established for multi-agent learning in the multi-agents setting . This work develops two decentralized TD with correction (TDC) algorithms formulti-agent off-Policy TD learning under Markovian sampling . The communication complexity of our algorithms is in the order of $\mathcal{O}(\ln\epsilon^{-1)$ of the existing decentralizedTD(0) is significantly lower than the communication complexity$\calcal {O}(O)(\ln/EPsilon=1) and $O_O__EPsilON=1$

Author(s) : Ziyi Chen, Yi Zhou, Rongrong Chen

Links : PDF - Abstract

Code :
Coursera

Keywords : complexity - learning - td - epsilon - communication -

Leave a Reply

Your email address will not be published. Required fields are marked *