The finite-time convergence of off-policy TD learning has been studied recently, but such a type of convergence has not been well established for multi-agent learning in the multi-agents setting . This work develops two decentralized TD with correction (TDC) algorithms formulti-agent off-Policy TD learning under Markovian sampling . The communication complexity of our algorithms is in the order of $\mathcal{O}(\ln\epsilon^{-1)$ of the existing decentralizedTD(0) is significantly lower than the communication complexity$\calcal {O}(O)(\ln/EPsilon=1) and$O_O__EPsilON=1\$

Author(s) : Ziyi Chen, Yi Zhou, Rongrong Chen