Reinforcement learning (RL) can enable task-oriented dialogue systems to steer the conversation towards successful task completion . In an end-to-endsetting, a response can be constructed in a word-level sequential decisionmaking process with the entire system vocabulary as action space . However, current approaches use an uninformed prior for training and optimize the latent distribution solely on the context . In this paper, we explore three ways ofleveraging an auxiliary task to shape the latent variable distribution: viapre-training, to obtain an informed prior, and via multitask learning . Our approach yields a moreaction-characterized latent representations which support end to-end dialogue policy optimization and achieves state-of-the-art success rates. These resultswarrant a more wide-spread use of RL in end-To-End dialogue models, say the authors . Back to the page you came from: http://www.j

Author(s) : Nurul Lubis, Christian Geishauser, Michael Heck, Hsien-chin Lin, Marco Moresi, Carel van Niekerk, Milica Gašić

Links : PDF - Abstract

Code :


Keywords : latent - dialogue - task - action - learning -

Leave a Reply

Your email address will not be published. Required fields are marked *