Marylee Williams Tuesday, December 2, 2025Print this page.

The Breakdown
***
Children with speech disorders, like a lisp, often have trouble being understood by family, teachers and friends, which can make school situations and everyday communication harder.
To compound the problem, there aren't enough speech-language pathologists nationwide to keep up with the demand for kids who need help. And most speech-reconstruction tools, which can help correct how someone talks, are built for adults.
Researchers from Carnegie Mellon University's School of Computer Science have created an AI tool that could help fix a child's speech and preserve their identity and personality by allowing them to hear the reconstructed speech in their own voice.
Children's Reconstructed Speech for SSDs (ChiReSSD) combines machine learning with human speech to generate audio clips of corrected speech that sound like the child. For example, if a child struggles with pronouncing double-r words, like "curry," the tool can generate an audio clip of that child saying the word correctly using only a clip of the child talking and text input.
"The potential clinical applications are really significant to me," said David Mortensen, an assistant research professor in the Language Technologies Institute (LTI). "The idea that a child could hear how they would say something in their voice, except with the sound of the disordered pronunciation removed, could be really transformative."
Mortensen's interest in creating technology to assist children with speech disorders started with his daughter. He said the speech-language pathologist who worked at her school was so overloaded that his daughter was only seen once or twice. Mortensen knew that his daughter would have benefited from technologies that could help speech-language pathologists treat children more efficiently.
Professor Carlos Busso and Ph.D. student Karen Rosero, both in the LTI, see ChiReSSD as a critical step to developing both audio and video tools that can address children's speech disorders. While ChiReSSD focuses on audio generation, Rosero and Busso developed video-based AI tools in previous work to analyze speech articulation after cleft lip and palate repair surgery.
"The big idea we are working toward is to generate speech that sounds like the kids and generate facial images that look like the kids," Busso said. "These audio and video clips can be combined to compare and contrast disordered and reconstructed speech. Then, we can localize the errors the children are making and create more targeted interventions, like particular words that address the specific speech issue."
ChiReSSD only needs an audio clip of the child to generate reconstructed speech, and it can be of the child saying anything. The tool separates a child's voice identity — their pitch or acoustic patterns — from the phonetic content of their speech, or what they're saying. The AI-based model learns from speech representations of the child's vocal identity. The system then identifies and corrects the mispronunciations based on the phonetic content. Finally, using the understanding of the child's vocal identity and a text input, like the words "chicken curry" or "rabbit," ChiReSSD generates a corrected audio clip that sounds like the child saying these target words.
"Psychological studies demonstrate that having the same voice as a reference benefits the patient," Rosero said. "For children, if the text-to-speech tool provides an adult or a standard plain voice, it may not be as beneficial as having their own voice as a reference for what to target in pronunciation."
Busso said this work makes significant strides in audio speech correction. The team's next step will be to focus on making the same impact in video.
Along with the LTI researchers, the team included Eunjung Yeo, a visiting scholar previously in SCS; Courtney Van'T Slot, a speech-language pathologist; and Rami Hallac, an associate professor from the University of Texas Southwestern Medical Center.
Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu