“The literature on the RMABP, whether on its theoretical, algorithmic, or application aspects, is currently vast to the point where it is virtually infeasible for researchers to keep up to date with the latest advances in the field.” True.

Markovian Restless Bandits and Index Policies: A Review

José Niño-Mora | Mathematics, 2023

The review is organized as follows. Section 2 surveys the antecedents to the RMABP, in particular, the classic MABP and the Gittins index policy. Section 3 formulates the RMABP and outlines the Whittle index policy. Section 4 reviews works on the complexity, approximation, and relaxations of the RMABP. Section 5 focuses on indexability, that is, the existence of the Whittle index and extensions. Section 6 discusses works on means of computing the Whittle index. Section 7 considers works that establish the optimality of the myopic index policy for the RMABP, whereas Section 8 reviews the asymptotic optimality of index policies. Section 9 surveys multi-action restless bandits. Section 10 considers policies that are different from Whittle’s and based on Lagrangian and fluid relaxations. Section 11 addresses reinforcement learning and, in particular, Q-learning solution approaches. Section 12 surveys works on the RMABP from the perspective of online learning. Sections 13 and 14 are devoted to works on applications of the RMABP in diverse settings. The former section focuses on MDP models and the latter on partially observable MDP (POMDP) models. Finally, Section 15 concludes this paper.

This is a really nice survey about RMAB. I suggest anyone who want a crash course of RMAB’s latest progress, especially theoretical, to refer to it (can’t even write a blog to well-summarize this survey paper. I suggest reading it directly). The logical sequence of each section is very well organized and connected—that it gives a brilliant overview of the theory results associated with RMAB—what the index policy can and cannot achieve, in RMAB models.

One interesting thing to notice is that there isn’t a uniformly accepted RMAB instance generation process for testing out all these algorithms. This would be actually another interesting topic to discuss, albeit not mentioned in the survey.