Restless Multi-Armed Bandits | Primal, Dual and Opportunity Costs
The textbook Multi-armed Bandit Allocation Indices by John Gittins, Kevin Glazebrook, Richard Weber has a charming Chapter 6, titled “Restless bandits and Lagrangian relaxation”. But the authors skipped a step in the prove of how to go from the LP to Dual and to Whittle Index Policy. Prelims: An RMAB (Restless Multi-Armed Bandit) instance is defined through the tuple $$ \lang S, A, r_{i}, \mathcal P_{i} \rang_{i \in [N]} $$ where $S_i $ is the state space, $A_i = \lbrace 0, 1 \rbrace$ the action space, $r_{i}$ and $\mathcal P_{i}$ denote the reward and transition kernels arm $i$....