Paper Reading Note | Strategic Experimentation with Exponential Bandits II
This is the second reading note of paper “Strategic Experimentation with Exponential Bandits”, following the previous post that describes the single-agent basic model: a continuous two-armed bandit: one risky (R) arm + one safe (S) arm. Model: When There Are More Than One Agents Players each have replica two-armed bandits. They share the same prior belief (over whether the Risky arm is good or bad), the same discount rate, and information is public. The risky arms’ time thresholds are i.i.d..—In other words, the risky arm’s ‘good’/‘bad’ nature is the same among all agents. But every agent’s threshold distribution might be realized differently. ...