In recent years, there are great interests as well as challenges in applyingreinforcement learning (RL) to recommendation systems . In this paper, wesummarize three key practical challenges of large-scale RL-based recommendersystems: massive state and action spaces, high-variance environment, and theunspecific reward setting in recommendation . We develop a model-based reinforcement learning framework, calledGoalRec. Inspired by the ideas of world model (model-based), value functionestimation (Model-free), and goal-based RL, a novel disentangled universalvalue function designed for item recommendation is proposed. It can generalizeto various goals that the recommender may have, and disentangle the stochasticenvironmental dynamics and reward signals accordingly. As a partof the value function, free from the reward function, a high-capacity reward-independent world model is trained to simulate complex .

Author(s) : Kai Wang, Zhene Zou, Qilin Deng, Runze Wu, Jianrong Tao, Changjie Fan, Liang Chen, Peng Cui

Links : PDF - Abstract

Code :
Coursera

Keywords : model - recommendation - reward - function - based -

Leave a Reply

Your email address will not be published. Required fields are marked *