I had the very fortune to listen to Nina Balcan giving a talk on one of her latest work, Online learning in Stackelberg Security Games:

ABSTRACT

In a Stackelberg Security Game, a defender commits to a randomized deployment of security resources, and an attacker best responds by attacking a target that maximizes their utility. While algorithms for computing an optimal strategy for the defender to commit to have been used in several real-world applications, deployed applications require knowledge about the utility function of the potential attacker. In this talk I will describe an online learning approach for addressing this problem. We consider algorithms that prescribe a randomized strategy for the defender at each step against an adversarially chosen sequence of attackers and obtain feedback on their choices. I will discuss online algorithms whose regret (when compared to the best fixed strategy in hindsight) is sublinear in the number of time steps. I will also consider an extension that handles auxiliary contextual information that is often readily available to each player (e.g. traffic patterns or weather conditions) and discuss what no regret guarantees are possible in this even more realistic scenario.

The beauty is how regret is defined in the context of Stackelberg Security Games, evolving dynamically, of course. I shall write about it tomorrow. And there are a few other interesting upcoming topics. Stay tuned!