“The literature on the RMABP, whether on its theoretical, algorithmic, or application aspects, is currently vast to the point where it is virtually infeasible for researchers to keep up to date with the latest advances in the field.” True.
Markovian Restless Bandits and Index Policies: A Review José Niño-Mora | Mathematics, 2023
The review is organized as follows. Section 2 surveys the antecedents to the RMABP, in particular, the classic MABP and the Gittins index policy. Section 3 formulates the RMABP and outlines the Whittle index policy. Section 4 reviews works on the complexity, approximation, and relaxations of the RMABP. Section 5 focuses on indexability, that is, the existence of the Whittle index and extensions. Section 6 discusses works on means of computing the Whittle index. Section 7 considers works that establish the optimality of the myopic index policy for the RMABP, whereas Section 8 reviews the asymptotic optimality of index policies. Section 9 surveys multi-action restless bandits. Section 10 considers policies that are different from Whittle’s and based on Lagrangian and fluid relaxations. Section 11 addresses reinforcement learning and, in particular, Q-learning solution approaches. Section 12 surveys works on the RMABP from the perspective of online learning. Sections 13 and 14 are devoted to works on applications of the RMABP in diverse settings. The former section focuses on MDP models and the latter on partially observable MDP (POMDP) models. Finally, Section 15 concludes this paper.
...