Bandit Superprocess Model and Whittle Condition
In the standard discounted multi‑armed bandit (MAB) with binary “play/rest” decisions, Gittins’ index gives a provably optimal per‑arm priority rule. Section 3.2 of Q. Zhao (2021) For general bandit super process, index-based policy no longer works. Counterexample: Section 3.2 of Q. Zhao (2021) Actually, in many cases, the index is not even defined. But somehow if the problem is “indexable” satisfying the following condition, Index Policy still is optimal ...