23 septembre 2021, 11h00–12h15
Toulouse
Salle Auditorium 5
MAD-Stat. Seminar
Résumé
This talk will be on the topic of sequential learning, and on some specific instances of the stochastic bandit problem. In the first part of the talk, I will give an introduction to the stochastic bandit problem, and present some core ideas and algorithms. In the second part of the talk, I will discuss in details the thresholding bandit problem, i.e. a sequential learning setting where the learner samples sequentially K unknown distributions for T times, and aims at outputting at the end the set of distributions whose means \mu_k are above a threshold \tau. We will study this problem under four structural assumptions, i.e. shape constraints: that the sequence of means is monotone, unimodal, concave, or unstructured (vanilla case). We will provide in each case minimax results on the performance of any strategies, as well as matching algorithms. This will highlight the fact that even more than in batch learning, structural assumptions have a hige impact in sequential learning. This work is based on three joint works with James Cheshire, Pierre Menard, Andrea Locatelli and Maurilio Gutzeit (http://proceedings.mlr.press/v125/cheshire20a.html and http://proceedings.mlr.press/v139/cheshire21a.html and http://proceedings.mlr.press/v48/locatelli16.html), and in the introduction I will discuss more in details bandit theory, see e.g. the book of Tor Lattimore and Csaba Szepesvari (https://tor-lattimore.com/downloads/book/book.pdf) for an excellent and very complete reference on this topic.
