We consider the problem of pure exploration with subset-wise preferencefeedback, which contains $N$ arms with features . The learner is allowed to query subsets of size $K$ and receives feedback in the form of a noisy winner . This setting is relevant in various online decision-makingscenarios involving human feedback such as online retailing, streamingservices, news feed, and online advertising; since it is easier and more reliable for people to choose a preferred item from a subset than to assign alikability score to an item in isolation . To the best of our knowledge, this is the first work that considers the subset-wide preference feedback model in astructured setting, which allows for potentially infinite set of arms . We also derive aninstance-dependent lower bound of $\Omega(\frac{d}{\Delta^2) which matches our upper bound on a worst-case instance. We run extensive experiments to corroborate our theoretical findings,and observe that our adaptive algorithm stops and requires up to 12x

Author(s) : Shubham Gupta, Aadirupa Saha, Sumeet Katariya

Links : PDF - Abstract

Code :
Coursera

Keywords : feedback - subset - online - pure - exploration -

Leave a Reply

Your email address will not be published. Required fields are marked *