Seminar

Existence of pure optimal uniform strategies in Partially Observable Markov Decision Processes,

Bruno Ziliotto (Toulouse School of Economics)

December 5, 2014, 14:00–15:15

Toulouse

Room MF 323

Decision Mathematics Seminar

Abstract

(joint work with Xavier Venel) In Partially Observable Markov Decision Processes (POMDP), at each stage, the decision-maker chooses an action, and receives a reward depending on the current state of the world. Then a new state is randomly drawn from a distribution depending on the action and on the past state. The decision-maker is not informed of the new state, but receives a signal on it. Then, the POMDP enters the next stage. As in Renault and Venel (2012) and given two integers m and n, in the n-stage POMDP starting at time m, the decision-maker wants to maximize the expected mean of the stage rewards between time m and time m+n. We prove that the decision-maker has an almost optimal pure strategy which does not depend on m and n. Moreover, with high probability, the random mean of the stage rewards is close to the optimal reward. Differently to most of previous literature, we do not assume that the transitions of the POMDP satisfy any ergodicity assumption.