Transient Non stationarity and Generalisation in Deep Reinforcement Learning

Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. We propose Iterated Relearning (ITER) to improve generalisation of deep RL agents. ITER augments standard RL training by repeated knowledge transfer of the current policy into a freshly initialised network. Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks ProcGen and Multiroom.

