publications | Amanda Bertsch

2025

preprint

Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention

Emily Xiao, Chin-Jou Li, Yilin Zhang, Graham Neubig, and Amanda Bertsch

In [under submission], 2025

PDF
preprint

Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions

Emmy Liu, Amanda Bertsch, Lintang Sutawika, Lindia Tjuatja, Patrick Fernandes, Lara Marinov, and 6 more authors

In [under submission], 2025

PDF
ICLR

Better Instruction-Following Through Minimum Bayes Risk

Ian Wu, Patrick Fernandes, Amanda Bertsch, Seungone Kim, Sina Pakazad, and Graham Neubig

In International Conference on Learning Representations (ICLR), 2025

PDF
NAACL

In-context learning with long-context models: An in-depth exploration

Amanda Bertsch, Maor Ivgi, Uri Alon, Jonathan Berant, Matthew R Gormley, and Graham Neubig

In 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2025

PDF

2024

CONDA

A Taxonomy for Data Contamination in Large Language Models

Medha Palavalli, Amanda Bertsch, and Matthew R Gormley

In The 1st Workshop on Data Contamination (CONDA), 2024

PDF
TMLR

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, and 2 more authors

In Transactions on Machine Learning Research, 2024

PDF

2023

EMNLP

To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing

Sireesh Gururaja, Amanda Bertsch, Clara Na, David Gray Widder, and Emma Strubell

In Empirical Methods in Natural Language Processing., 2023

PDF
Big Picture

It’s MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk

Amanda Bertsch, Alex Xie, Graham Neubig, and Matthew R. Gormley

In Proceedings of the First Big Picture Workshop., 2023

PDF
NeurIPS

Unlimiformer: Long-Range Transformers with Unlimited Length Input

Amanda Bertsch, Uri Alon, Graham Neubig, and Matthew R. Gormley

In Conference on Neural Information Processing Systems., 2023

PDF
TACL

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, and 5 more authors

In Transactions of the Association of Computational Linguistics., 2023

PDF
EMNLP Demo

Prompt2Model: Generating Deployable Models from Natural Language Instructions

Vijay Viswanathan, Chenyang Zhao, Amanda Bertsch, Tongshuang Wu, and Graham Neubig

In Empirical Methods in Natural Language Processing: Demo Track., 2023

PDF
Preprint

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, and 18 more authors

In arXiv., 2023

PDF
ClinicalNLP

SummQA at MEDIQA-Chat 2023: In-Context Learning with GPT-4 for Medical Summarization

Yash Mathur, Sanketh Rangreji, Raghav Kapoor, Medha Palavalli, Amanda Bertsch, and Matthew Gormley

In Proceedings of the 5th Clinical Natural Language Processing Workshop., Jul 2023

Abs DOI

Medical dialogue summarization is challenging due to the unstructured nature of medical conversations, the use of medical terminologyin gold summaries, and the need to identify key information across multiple symptom sets. We present a novel system for the Dialogue2Note Medical Summarization tasks in the MEDIQA 2023 Shared Task. Our approach for sectionwise summarization (Task A) is a two-stage process of selecting semantically similar dialogues and using the top-k similar dialogues as in-context examples for GPT-4. For full-note summarization (Task B), we use a similar solution with k=1. We achieved 3rd place in Task A (2nd among all teams), 4th place in Task B Division Wise Summarization (2nd among all teams), 15th place in Task A Section Header Classification (9th among all teams), and 8th place among all teams in Task B. Our results highlight the effectiveness of few-shot prompting for this task, though we also identify several weaknesses of prompting-based approaches. We compare GPT-4 performance with several finetuned baselines. We find that GPT-4 summaries are more abstractive and shorter. We make our code publicly available.

2022

Findings

He Said, She Said: Style Transfer for Shifting the Perspective of Dialogues

Amanda Bertsch, Graham Neubig, and Matthew R. Gormley

In Findings of the Association for Computational Linguistics: EMNLP 2022., Jul 2022

Abs DOI PDF

In this work, we define a new style transfer task: perspective shift, which reframes a dialogue from informal first person to a formal third person rephrasing of the text. This task requires challenging coreference resolution, emotion attribution, and interpretation of informal text. We explore several baseline approaches and discuss further directions on this task when applied to short dialogues. As a sample application, we demonstrate that applying perspective shifting to a dialogue summarization dataset (SAMSum) substantially improves the zero-shot performance of extractive news summarization models on this data. Additionally, supervised extractive models perform better when trained on perspective shifted data than on the original dialogues. We release our code publicly.
GeBNLP

Evaluating Gender Bias Transfer from Film Data

Amanda Bertsch, Ashley Oh, Sanika Natu, Swetha Gangu, Alan W. Black, and Emma Strubell

In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)., Jul 2022

Abs PDF

Films are a rich source of data for natural language processing. OpenSubtitles (Lison and Tiedemann, 2016) is a popular movie script dataset, used for training models for tasks such as machine translation and dialogue generation. However, movies often contain biases that reflect society at the time, and these biases may be introduced during pre-training and influence downstream models. We perform sentiment analysis on template infilling (Kurita et al., 2019) and the Sentence Embedding Association Test (May et al., 2019) to measure how BERT-based language models change after continued pre-training on OpenSubtitles. We consider gender bias as a primary motivating case for this analysis, while also measuring other social biases such as disability. We show that sentiment analysis on template infilling is not an effective measure of bias due to the rarity of disability and gender identifying tokens in the movie dialogue. We extend our analysis to a longitudinal study of bias in film dialogue over the last 110 years and find that continued pre-training on OpenSubtitles encodes additional bias into BERT. We show that BERT learns associations that reflect the biases and representation of each film era, suggesting that additional care must be taken when using historical data.

2021

W-NUT

Detection of Puffery on the English Wikipedia

Amanda Bertsch, and Steven Bethard

In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)., Nov 2021

Abs DOI PDF

On Wikipedia, an online crowdsourced encyclopedia, volunteers enforce the encyclopedia’s editorial policies. Wikipedia’s policy on maintaining a neutral point of view has inspired recent research on bias detection, including “weasel words” and “hedges”. Yet to date, little work has been done on identifying “puffery,” phrases that are overly positive without a verifiable source. We demonstrate that collecting training data for this task requires some care, and construct a dataset by combining Wikipedia editorial annotations and information retrieval techniques. We compare several approaches to predicting puffery, and achieve 0.963 f1 score by incorporating citation features into a RoBERTa model. Finally, we demonstrate how to integrate our model with Wikipedia’s public infrastructure to give back to the Wikipedia editor community.