Relevant Paper(s): N/A
Abstract:
The current benchmarking paradigm in AI has many issues: benchmarks saturate quickly, are susceptible to overfitting, contain exploitable annotator artifacts, have unclear or imperfect evaluation metrics, and do not necessarily measure what we really care about. I will talk about our work in trying to rethink the way we do benchmarking in AI, specifically in natural language processing, focusing mostly on the Dynabench platform (dynabench.org).
Bio: Douwe Kiela is the Head of Research at Hugging Face. Before, he was a Research Scientist at Facebook AI Research. His current research interests lie in developing better models for (grounded, multi-agent) language understanding and better tools for evaluation and benchmarking.