Relevant Paper(s):
Abstract: Classifiers often rely on features like the background that may be spuriously correlated with the label. In practice, this results in poor test-time accuracy as the classifier may be deployed in an environment where these spurious correlations no longer hold.
While many algorithms have been developed to heuristically tackle this challenge of out-of-distribution generalization, in this work, we take a step back to ask: why do classifiers rely on spurious correlations in the first place?
While the answer to this might seem straightforward, I'll begin by explaining why existing theoretical models of spurious correlations do not capture the fundamental reasons behind why classifiers rely on spurious correlations. I'll then propose an alternative theoretical model which helps uncover those fundamental reasons. In particular, by theoretically studying linear classifiers in this theoretical model, we'll look at two failure modes: one that is "geometric" in nature another that is "statistical" in nature. These modes shed insight to the exact biases in gradient descent, and the exact properties of real-world data that incentivize classifiers to use spurious correlations. Finally, I'll discuss experiments on neural networks that validate these insights in more practical scenarios.
Hopefully, with the knowledge of these failure modes, algorithm designers can be better informed about how to fix these failure modes, and OoD research can be built upon a more rigorous foundation.
This talk is based on work done in collaboration with Anders Andreassen and Behnam Neyshabur at Google. Published at ICLR 2021.
Bio: Vaishnavh Nagarajan is a final year Computer Science PhD student at Carnegie Mellon University (CMU), advised by Zico Kolter. Vaishnavh is broadly interested in the theoretical foundations of machine learning, involving problems in the intersection of learning theory and optimization. He is particularly interested in theoretically understanding when and why modern machine learning algorithms work (or do not work) in practice. His work has received an Outstanding New Directions Paper Award at NeurIPS'19, an oral presentation at NeurIPS'17 and three workshop spotlight talks. Prior to CMU, Vaishnavh completed his undergraduate studies in Computer Science and Engineering in the Indian Institute of Technology, Madras.