Bandits - background
Consider a decision problem where you have a collection of $\mathcal L$ papers to read before your next week’s meeting with your advisor. For each paper $l \in \mathcal L$, you can estimate its relevance and quality, represented as a potential intellectual reward $r_i$, modeled by a probability distribution $F_i(\cdot)$. $r_i$’s value is only learnt and acquired after you thoroughly read through the paper. Additionally, each paper comes with an uncertain reading time $t_i$, modeled by another distribution $G_i(\cdot)$....