Mean field game facilitates analyzing multi-armed bandit (MAB) for a largenumber of agents by approximating their interactions with an average effect . To accommodate the continuousreward function, we encode the learned reward into an agent state, which is mapped to its stochastic arm playing policy . We show that the state evolution is upper semi-continuous, based on which the existence of MFE is obtained . Extensive evaluations validate our MFEcharacterization, and exhibit tight empirical regret of the MAB problem of the bandit problem. Extensive evaluation validate our . MAB evaluation, and . exhibit tight . empirical regret, according to the authors of this paper . We can characterize a contraction mapping for the ODE to . ensure a uniqueMFE for the bandi game. On this basis, we can . characterize a contract mapping to ensure a distinct ODE for the

Author(s) : Xiong Wang, Riheng Jia

Links : PDF - Abstract

Code :
Coursera

Keywords : game - bandit - mab - regret - mapping -

Leave a Reply

Your email address will not be published. Required fields are marked *