Reinforcement learning based recommender systems (RL-based RS) aims at learning a good policy from a batch of collected data . However, current RL-basedRS benchmarks commonly have a large reality gap, because they involve artificial RL datasets or semi-simulated RS datasets . In real-world situations, notall recommendation problems are suitable to be transformed into reinforcementlearning problems . In this paper, we introduce the RL4RS benchmark – a new resource fully collected from industrialapplications to train and evaluate RL algorithms . It contains two datasets, tuned simulation environments, relatedadvanced RL baselines, data understanding tools, and counterfactual policyevaluation algorithms . In addition to the resource to contribute to research in reinforcementlearning and neural combinatorial optimization, we expect the resource will also contribute to

Author(s) : Kai Wang, Zhene Zou, Qilin Deng, Yue Shang, Minghao Zhao, Runze Wu, Xudong Shen, Tangjie Lyu, Changjie Fan

Links : PDF - Abstract

Code :

Keywords : rl - rs - resource - learning - based -

Leave a Reply

Your email address will not be published. Required fields are marked *