One of the most potent and underutilized tools available for understanding the development and structure of communication systems, and for producing synthetic language data, comes from the field of emergent communication. In multi-agent reinforcement learning simulations, communication systems can emerge with only limited inductive biases. These communication systems, sometimes called emergent language (EL), provide a controlled laboratory in which the emergence of human language, its evolution over time, and its typological variation, can be studied and understood.

As importantly, emergent communication provides a source for robust communication protocols and a potential well of language-like data that could be used to pretrain large models, reducing the need for compromized human-generated data.

This is joint work with Brendon Boldt.