This paper studies bandit algorithms under data poisoning attacks in abounded reward setting . We consider a strong attacker model in which theattacker can observe both the selected actions and their corresponding rewards . We show that there exists an $O(\logT)$ regret bandit algorithm, specifically the classical UCB, that requires $Omega(T) amount of contamination to suffer regret . We then propose a novel algorithm, Secure-UCB, which uses limited verifications to access alimited number of uncontaminated rewards . Wecan then conclude that Secure- UCB is order-optimal in terms of both theexpected regret and the expected number of verifications, and can savestochastic bandits from any data poisoning attack in any data-poisoning attack . They say Secure-UBP can restore the order optimal $O(‘T) and $O(“T)” and “O(‘O(‘L)””L)” .

Author(s) : Anshuka Rangi, Long Tran-Thanh, Haifeng Xu, Massimo Franceschetti

Links : PDF - Abstract

Code :
Coursera

Keywords : secure - data - poisoning - ucb - regret -

Leave a Reply

Your email address will not be published. Required fields are marked *