recent publications
Publications by year in reverse chronological order.
2024
- Brendon Boldt and David R. Mortensen. 2024. A Review of the Applications of Deep Learning-Based Emergent Communication. Transactions on Machine Learning Research.
- Shijia Zhou, Leonie Weissweiler, Taiqi He, Hinrich Schütze, David R. Mortensen, and Lori Levin. 2024. Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3804–3811, Torino, ItaliaMay. ELRA and ICCL.
- Liang Lu, Jingzhi Wang, and David R. Mortensen. 2024. Improved Neural Protoform Reconstruction via Reflex Prediction. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8683–8707, Torino, ItaliaMay. ELRA and ICCL.
- Ryan Soh-Eun Shim, Kalvin Chang, and David R. Mortensen. 2024. Phonotactic Complexity across Dialects. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12734–12748, Torino, ItaliaMay. ELRA and ICCL.
- Vilém Zouhar, Kalvin Chang, Chenxuan Cui, Nate B. Carlson, Nathaniel Romney Robinson, Mrinmaya Sachan, and David R. Mortensen. 2024. PWESuite: Phonetic Word Embeddings and Tasks They Facilitate. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13344–13355, Torino, ItaliaMay. ELRA and ICCL.
- David R. Mortensen, Valentina Izrailevitch, Yunze Xiao, Hinrich Schütze, and Leonie Weissweiler. 2024. Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17359–17364, Torino, ItaliaMay. ELRA and ICCL.
- Brendon Boldt and David Mortensen. 2024. XferBench: a Data-Driven Benchmark for Emergent Language. In Kevin Duh, Helena Gomez, and Steven Bethard, editors, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1475–1489, Mexico City, MexicoJune. Association for Computational Linguistics.
2023
- David R Mortensen. 2023. Kuki-Chin Phonology: An Overview. Himalayan Linguistics, 22(1).
- Nathaniel Romney Robinson, Matthew Dean Stutzman, Stephen D. Richardson, and David R Mortensen. 2023. African Substrates Rather Than European Lexifiers to Augment African-diaspora Creole Translation. In 4th Workshop on African Natural Language Processing.
- Kalvin Chang, Nathaniel Robinson, Anna Cai, Ting Chen, Annie Zhang, and David Mortensen. 2023. Automating Sound Change Prediction for Phylogenetic Inference: A Tukanoan Case Study. In Nina Tahmasebi, Syrielle Montariol, Haim Dubossarsky, Andrey Kutuzov, Simon Hengchen, David Alfter, Francesco Periti, and Pierluigi Cassotti, editors, Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, pages 129–142, SingaporeDecember. Association for Computational Linguistics.
- Nathaniel Robinson, Perez Ogayo, David R. Mortensen, and Graham Neubig. 2023. ChatGPT MT: Competitive for High- (but Not Low-) Resource Languages. In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz, editors, Proceedings of the Eighth Conference on Machine Translation, pages 392–418, SingaporeDecember. Association for Computational Linguistics.
- Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David Mortensen, Noah Smith, and Yulia Tsvetkov. 2023. Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9904–9923, SingaporeDecember. Association for Computational Linguistics.
- Leonie Weissweiler, Valentin Hofmann, Anjali Kantharuban, Anna Cai, Ritam Dutt, Amey Hengle, Anubha Kabra, Atharva Kulkarni, Abhishek Vijayakumar, Haofei Yu, Hinrich Schuetze, Kemal Oflazer, and David Mortensen. 2023. Counting the Bugs in ChatGPT’s Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6508–6524, SingaporeDecember. Association for Computational Linguistics.
- Yanlin Feng, Adithya Pratapa, and David Mortensen. 2023. Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity Typing. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15550–15560, SingaporeDecember. Association for Computational Linguistics.
- Leonie Weissweiler, Taiqi He, Naoki Otani, David R. Mortensen, Lori Levin, and Hinrich Schütze. 2023. Construction Grammar Provides Unique Insight into Neural Language Models. In Claire Bonial and Harish Tayyar Madabushi, editors, Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023), pages 85–95, Washington, D.C.March. Association for Computational Linguistics.
- Young Min Kim, Kalvin Chang, Chenxuan Cui, and David R. Mortensen. 2023. Transformed Protoform Reconstruction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 24–38, Toronto, CanadaJuly. Association for Computational Linguistics.
- David R. Mortensen, Ela Gulsen, Taiqi He, Nathaniel Robinson, Jonathan Amith, Lindia Tjuatja, and Lori Levin. 2023. Generalized Glossing Guidelines: An Explicit, Human- and Machine-Readable, Item-and-Process Convention for Morphological Annotation. In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 58–67, Toronto, CanadaJuly. Association for Computational Linguistics.
- Georgios Karakasidis, Nathaniel Robinson, Yaroslav Getman, Atieno Ogayo, Ragheb Al-Ghezi, Ananya Ayasi, Shinji Watanabe, David R Mortensen, and Mikko Kurimo. 2023. Multilingual TTS Accent Impressions for Accented ASR. In International Conference on Text, Speech, and Dialogue, pages 317–327. Springer.
- Taiqi He, Lindia Tjuatja, Nathaniel Robinson, Shinji Watanabe, David R. Mortensen, Graham Neubig, and Lori Levin. 2023. SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing. In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 209–216, Toronto, CanadaJuly. Association for Computational Linguistics.
2022
- Nathaniel Robinson, Cameron Hogan, Nancy Fulda, and David R. Mortensen. 2022. Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican. In Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022), pages 35–42, Gyeongju, Republic of KoreaOctober. Association for Computational Linguistics.
- Kalvin Chang, Chenxuan Cui, Youngmin Kim, and David R. Mortensen. 2022. WikiHan: A New Comparative Dataset for Chinese Languages. In COLING 2022.
- Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen, and Shinji Watanabe. 2022. When Is TTS Augmentation Through a Pivot Language Useful? In Interspeech 2022.
- Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, and Shinji Watanabe. 2022. Speech Recognition for Around 2000 Languages without Audio. In Interspeech 2022.
- David R. Mortensen, Xinyu Zhang, Chenxuan Cui, and Katherine J. Zhang. 2022. A Hmong Corpus with Elaborate Expression Annotations. In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022).
- Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, and Shinji Watanabe. 2022. Phone Inventories and Recognition for Every Language. In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022).
- Clayton Marr and David Mortensen. 2022. Large-Scale Computerized Forward Reconstruction Yields New Perspectives in French Diachronic Phonology. Diachronica.
- Xinjian Li, Florian Metze, David Mortensen, Shinji Watanabe, and Alan Black. 2022. Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2106–2115, Dublin, IrelandMay. Association for Computational Linguistics.
- Chenxuan Cui, Katherine Zhang, and David Mortensen. 2022. Learning the Ordering of Coordinate Compounds and Elaborate Expressions in Hmong, Lahu, and Chinese. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3656–3669, Seattle, United StatesJuly. Association for Computational Linguistics.
- David R. Mortensen. 2022. Wfst4Str. Rust/Python library for working with strings using weighted finite state transducers.
2021
- 2021. ASR2K: Speech Recognition for Around 2000 Languages without Audio. In Interspeech 2022.
- David Francis, Ella Rabinovich, Farhan Samir, David Mortensen, and Suzanne Stevenson. 2021. Quantifying Cognitive Factors in Lexical Decline. Transactions of the Association for Computational Linguistics, 9:1529–1545, December.
- Xinjian Li, David R Mortensen, Florian Metze, and Alan W. Black. 2021. Multilingual phonetic dataset for low resource speech recognition. In ICASSP 2021.
- David R. Mortensen and Jordan Picone. 2021. East Tusom: A phonetic and phonological sketch of a largely undocumented Tangkhulic language. Linguistics of the Tibeto-Burman Area, 44(2):168–196.
- David R. Mortensen, Jordan Picone, Xinjian Li, and Kathleen Siminyu. 2021. Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments. In Proc. Interspeech 2021, pages 3660–3664.
- Adithya Pratapa, Antonios Anastasopoulos, Shruti Rijhwani, Aditi Chaudhary, David R. Mortensen, Graham Neubig, and Yulia Tsvetkov. 2021. Evaluating the Morphosyntactic Well-formedness of Generated Texts. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7131–7150, Online and Punta Cana, Dominican RepublicNovember. Association for Computational Linguistics.
- Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David R. Mortensen, Michael R. Marlo, and Graham Neubig. 2021. Phoneme Recognition Through Fine Tuning of Phonetic Representations: A Case Study on Luhya Language Varieties. In Proc. Interspeech 2021, pages 271–275.
- Jimin Sun, Hwijeen Ahn, Chan Young Park, Yulia Tsvetkov, and David R. Mortensen. 2021. Ranking Transfer Languages with Pragmatically-Motivated Features for Multilingual Sentiment Analysis. In EACL 2021.
- Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, and Shinji Watanabe. 2021. Differentiable Allophone Graphs for Language-Universal Speech Recognition. In Proc. Interspeech 2021, pages 2471–2475.
2020
- Aditi Chaudhary, Antonios Anastasopoulos, Adithya Pratapa, David R. Mortensen, Zaid Sheikh, Yulia Tsvetkov, and Graham Neubig. 2020. Automatic Extraction of Rules Governing Morphological Agreement. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
- Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan Black, and Florian Metze. 2020. Towards Zero-shot Learning for Automatic Phonemic Transcription. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence.
- Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W Black, and Florian Metze. 2020. Universal Phone Recognition with a Multilingual Allophone System. In ICASSP 2020.
- Clayton Marr and David R. Mortensen. 2020. Computerized Forward Reconstruction for Analysis in Diachronic Phonology, and Latin to French Reflex Prediction. In 1st Workshop on Language Technologies for Historical and Ancient LAnguages (LT4HALA).
- Shahan Ali Memon, Aman Tyagi, David R. Mortensen, and Kathleen M Carley. 2020. Characterizing Sociolinguistic Variation in the Competing Vaccination Communities. In Halil Bisgin, Ayaz Hyder, Chris Dancy, and Robert Thomson, editors, Proceedings of the International Conference SBP-BRiMS 2020, Washington DC. Springer.
- David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. Black, Florian Metze, and Graham Neubig. 2020. AlloVera: A Multilingual Allophone Database. In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020).
- Maria Ryskina, Ella Rabinovich, Taylor Berg-Kirkpatrick, David R. Mortensen, and Yulia Tsvetkov. 2020. Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods. In Proceedings of the Society for Computation in Linguistics, volume 3.
2019
- Aditi Chaudhary, Elizabeth Salesky, Gayatri Bhat, David R. Mortensen, Jaime Carbonell, and Yulia Tsvetkov. 2019. CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology. In Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 57–70, Florence, ItalyAugust. Association for Computational Linguistics.
- David R. Mortensen. 2019. Hmong (Mong Leng). In Alice Vittrant and Justin Watkins, editors, The Mainland Southeast Asia Linguistic Area. De Gruyter Mouton, Berlin, Boston, edition.
- David R. Mortensen and Jong Hyuk Park. 2019. IndoMorph. Collection of Foma FST morphological analyzers for languages of the Indian subcontinent.
2018
- Aditi Chaudhary, Chunting Zhou, Lori Levin, Graham Neubig, David R. Mortensen, and Jaime Carbonell. 2018. Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3285–3295, Brussels, BelgiumOctober. Association for Computational Linguistics.
- Patrick Littell, Tian Tian, Ruochen Xu, Zaid Sheikh, David Mortensen, Lori Levin, Francis Tyers, Hiroaki Hayashi, Graham Horwood, Steve Sloto, Emily Tagtow, Alan Black, Yiming Yang, Teruko Mitamura, and Eduard Hovy. 2018. The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach. Machine Translation, 32(1–2):105–126.
- Patrick Littell, Tom McCoy, Na-Rae Han, Shruti Rijhwani, Zaid Sheikh, David Mortensen, Teruko Mitamura, and Lori Levin. 2018. Parser combinators for Tigrinya and Oromo morphology. In Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, JapanMay. European Language Resource Association.
- David R. Mortensen, Siddharth Dalmia, and Patrick Littell. 2018. Epitran: Precision G2P for Many Languages. In Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, JapanMay. European Language Resource Association.
- David R. Mortensen. 2018. Epitran. Precision orthography-to-IPA conversion for 65 languages.
- David R. Mortensen. 2018. MStem. Python multilingual morphological stemming framework and stemmer collection.
2017
- Patrick Littell, David R. Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, and Lori Levin. 2017. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 8–14, Valencia, SpainApril. Association for Computational Linguistics.
- David R. Mortensen. 2017. Hmong-Mien Languages. In Mark Aronoff, editor, Oxford Research Encyclopedia of Linguistics. Oxford University Press, New York, edition, May.
- David R. Mortensen. 2017. Hmong-Mien Languages. In Mark Aronoff, editor, Oxford Bibliographies in Linguistics. Oxford University Press, New York, edition.
2016
- Akash Bharadwaj, David Mortensen, Chris Dyer, and Jaime Carbonell. 2016. Phonologically Aware Neural Model for Named Entity Recognition in Low Resource Transfer Settings. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1462–1472, Austin, TexasNovember. Association for Computational Linguistics.
- Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, and Lori Levin. 2016. Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 3318–3324, Portorož, SloveniaMay. European Language Resources Association (ELRA).
- Patrick Littell, Kartik Goyal, David R. Mortensen, Alexa Little, Chris Dyer, and Lori Levin. 2016. Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 998–1006, Osaka, JapanDecember. The COLING 2016 Organizing Committee.
- David R. Mortensen, Patrick Littell, Akash Bharadwaj, Kartik Goyal, Chris Dyer, and Lori Levin. 2016. PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3475–3484, Osaka, JapanDecember. The COLING 2016 Organizing Committee.
- David R. Mortensen. 2016. PanPhon. Articulatory feature extractor and library.
- Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W Black, Lori Levin, and Chris Dyer. 2016. Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1357–1366, San Diego, CaliforniaJune. Association for Computational Linguistics.