MuZero, a model-based reinforcement learning algorithm that uses a valueequivalent dynamics model, achieved state-of-the-art performance in Chess,Shogi and the game of Go . In contrast to standard forward dynamics models that predict a full next state, value equivalent models are trained to predict afuture value, thereby emphasizing value relevant information in therepresentations . We find that actiontrajectories may diverge between observation embeddings and internal statetransition dynamics, which could lead to instability during planning . Based on this insight, we propose two regularization techniques to stabilize MuZero’sperformance . We provide an open-source implementation of MuZeroalong with an interactive visualizer of learned representations .

Author(s) : Joery A. de Vries, Ken S. Voskuil, Thomas M. Moerland, Aske Plaat

Links : PDF - Abstract

Code :


Keywords : models - muzero - dynamics - model - based -

Leave a Reply

Your email address will not be published. Required fields are marked *