Bandit Superprocesses Relaxation and Approximation Algorithm
Our Beyond-Bayesian-Bandits reading group covered this paper today: Multitasking: Efficient Optimal Planning for Bandit Superprocesses Dylan Hadfield-Menell and Stuart Russell. [Link](https://people.csail.mit.edu/dhm/files/bsp_bbvi.pdf to paper) and its supplementary materials. A bandit superprocess is a decision problem composed from multiple independent Markov decision processes (MDPs), coupled only by the constraint that, at each time step, the agent may act in only one of the MDPs. Multitasking problems of this kind are ubiquitous in the real world, yet very little is known about them from a computational viewpoint, beyond the observation that optimal policies for the superprocess may prescribe actions that would be suboptimal for an MDP considered in isolation....