One of the biggest success stories of computational game theory is the deployment of game theoretic algorithms for physical security purposes. These algorithms compute an optimal strategy for the defender to commit to in a Stackelberg game, where the attacker observes the defender's randomized strategy and best-responds. One of the challenges in building this game model is dealing with unknown attacker utilities. In deployed applications the payoffs are often estimated by consulting risk analysts and using available historical data. However, inaccuracy in this estimation can lead to significant inefficiencies in optimizing the strategy of the defender. We address this challenge through a learning theoretic approach. We consider two scenarios. First, for the case where there is a single attacker with unknown utilities, we design an algorithm that optimizes the defender's strategy with no prior information, by observing the attacker's responses to randomized deployments of resources and learning his priorities. In contrast to previous work, our algorithm requires a number of queries that is polynomial in the representation of the game. Our algorithm then learns the optimal strategy to commit to. Second, we consider the case where the defender is faced with an unknown incoming sequence of attackers chosen from a known set of types. We provide online no-regret learning algorithms, which effectively compete with the optimal-in-hindsight strategy, such that the regret is only polynomial in the number of types and targets.
This talk is based on joint work with Nina Balcan, Avrim Blum, and Ariel Procaccia. Part of this work will appear at NIPS '14.