Séminaire

The Multi-Armed Bandit Problem with Covariates

Vianney Perchet (Université Paris Diderot)

29 mars 2013, 13h45–15h00

Toulouse

Salle MF 323

Decision Mathematics Seminar

Résumé

We consider the optimization framework known as "multi-armed bandit" where a finite number of actions (or arms/decisions) give random rewards, whose expectations depend on the action. The objective of a decision maker is to select sequentially actions in order to find the quickest possible, the optimal action (or equivalently, to minimize his regret). We introduce and analyze a natural policy, called Successive Elimination, that achieves optimal bounds. We will then consider the case where the noisy reward realizations depend on an observable covariate; this setting allows for dynamically changing rewards that better describe applications where side information is available. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination that adaptively decomposes the global problem into suitably “localized” static bandit problems. (With Philippe Rigollet - University of Princeton)

Voir aussi