Full Professor at Karlsruhe Institute of Technology
In June 2007 I started a new lab in Karlsruhe, Germany.
Please refer to the webpage of the Cognitive Systems Lab which describes the research that I direct in the area of human-centered technologies and applications based on biosignals, i.e. capturing, recognizing, and interpreting signals such as speech, muscle, and brain activity.
Cognitive Systems Lab
Institute of Anthropomatics
Faculty of Computer Science
Karlsruhe Institute of Technology (KIT)
Adenauerring 4
76131 Karlsruhe
Germany
Phone: +49 721 608 6300
Fax: +49 721 608 6116
E-mail: tanja@ira.uka.de
Administrative Assistant:
Helga Scherer scherer@ira.uka.de
Phone: +49 721 608 6312
Research Assistant Professor
Language Technologies Institute,
School of Computer Science
407 South Craig Street
Office 201
Pittsburgh, PA 15213-3891
Phone: +1 412 268-4278
Fax: +1 412 268-5578
E-mail: tanja@cs.cmu.edu
Administrative Assistant:
Lisa Mauti lmauti@cs.cmu.edu
Phone: +1 412 268-5480
Associate Director
InterACT: International Center for Advanced Communication Technologies
Research
Publications
Curriculum Vitae (CV)
Projects
Teaching
Students
Click here to search and download (co-)advised theses.
Professional Activities
Other Activities
Last modified: Mon Nov 27 8:47:45 EDT 2006
It is my belief that automatic speech recognition systems are the most natural front-end for applications which allow human communication across language and culture barriers. Therefore part of my research focuses on revealing techniques and algorithms to construct Human-human Communication and Human-machine Interaction applications that can robustly function in multilingual environments. This includes the rapid deployment of speech recognizers in new tasks and languages. The massive reduction of effort in terms of time and costs is necessary to speed up the development of recognizers in new tasks and languages. I believe that this is an essential prerequisite in order to make speech-driven applications attractive and available to the public and also extends to speakers of languages in which only few or no resources are available. My research along these lines includes Language Independent and Language Adaptive Acoustic Modeling and the development of Multilingual large vocabulary continuous speech recognition (LVCSR) systems. Ongoing projects in which I am pursuing this research topic are [SPICE], [GlobalPhone], [GALE], [TRANSTAC].
My research efforts in the area of speech translation include domain unlimited speech translation [STR-DUST] and small footprint systems on mobile devices [TRANSTAC], [LASER ACTD].
The predecessor of TransTac was the Darpa program Babylon targeting the development of a two-way, natural language speech translation interface. See also the group restricted page resources .
Other significant speech translation programs I was involved in are the long-term project VERBMOBIL funded by the Federal Ministry of Education, Science, Research and Technology (BMBF). Verbmobil was the largest project on Speech-to-Speech Translation focussing on translation systems for spontaenous speech in face-to-face dialog situations. The long-term goal was to investigate language technologies as well as its economical application and to prepare Germany for the next millenium to be a country with a critical mass of speech and language specialists in industry and science. The program ran from 1996-2000. Two other projects were NESPOLE! the first NSF-EU cooperation to build Speech-based E-commerce services (travel arrangements) over telephone in 4 languages, and an Arabic derivative, working on Egypt SR Egyptian Arabic. Further projects in the area of tourist assistance including navigation, sightseeing information, and speech translation, i.e. LingWear a wearable language support assistant. Also, check out C-STAR the Consortium for Speech Translation Advanced Research, founded by Alex Waibel.
Research in automatic Speaker Identification (SID) has mainly focused so far on recognizing people speaking on the phone. The developed systems and technologies deliver a very high performance when the the test situation matches the training conditions and when the noise level is low. Key problems such as recognizing speakers in the presence of noise, in mismatching test and training conditions, and in the far-field have not been addressed yet. When speech data is captured by microphones more than a few inches apart from the speaker, the room acoustics and noise sources, reverberation, and multi-path propagation become key concerns. The project FarSID aims to study the far-field effects on current state-of-the-art SID systems and investigate strategies to improve SID system performance in far-field scenarios. For more information on FarSID, please visit FarSID.
Detect, identify, and annotate cues, such as language, accent, or speaker identity for the purpose of improving automatic speech recognition as well as speech translation. The most recent funded project in which these aspects of my research are pursued is [GALE].
EMG-based speech input methods, research on Whispered Speech using other than air-conduction microphones, and EEG-based input methods for User Mode detection. Check out my publication page and the list of useful links to research on Brain Computer Interfaces BCI.
I am collaborating with the founder of the Wild Dolphin Project Dr. Denise Herzing to build an underwater signal processing system that can play, record, and interpret dolphins sounds. This project [Dolphins] is joint work with Alan W Black and Bob Frederking from Language Technologies Institute.
Click HERE to search and download my publications or go directly by years:
The complete archived magazin, June 2003
Speech technology potentially allows everyone to participate in today's information revolution and can bridge the language barrier gap. Unfortunately, construction of speech processing systems requires significant resources. With some 4500-6000 languages in the world, traditionally speech processing is prohibitive to all but the most economically viable languages. In spite of recent improvements in speech processing, supporting new languages is a skilled job requiring significant effort from trained individuals. This project aims to overcome both limitations by providing innovative methods and tools for naive users to develop speech processing models, collect appropriate data to build these models, and evaluate the results allowing iterative improvements. By integrating speech recognition and synthesis technologies into an interactive language creation and evaluation toolkit usable by unskilled users, speech system generation will be revolutionized. Data and components for new languages will become available to everybody improving the mutual understanding and the educational and cultural exchange between the U.S. and other countries.
Press Release about Spice and Speech Translation at InterACT, Carnegie Mellon's Tartan Online News September 2005
Techniques to create multilingual acoustic models require multilingual speech and text databases that cover many languages and that are uniform across languages. GlobalPhone is an ongoing database collection that provides transcribed speech data for the development and evaluation of large speech processing systems in the most widespread languages of the world. GlobalPhone is designed to be uniform across languages with respect to the amount of text and audio data per language, the audio data quality (microphone, noise, channel), the collection scenario (task, setup, speaking style etc.), and the transcription conventions and supplies an excellent basis for research in the areas of (1) multilingual speech recognition, (2) rapid deployment of speech processing systems to new languages, (3) language and speaker identification tasks, (4) monolingual speech recognition in a large variety of languages, as well as (5) comparisons across major languages based on text and speech data. To date, the GlobalPhone corpus covers 18 languages Arabic, Bulgarian, Chinese (Mandarin and Shanghainese), Croatian, Czech, French, German, Japanese, Korean, Portuguese, Polish, Russian, Spanish, Swedish, Tamil, Thai, and Turkish. In each language about 100 adult speakers were recorded with close-speaking microphones when reading about 100 sentences each. The entire corpus contains over 300 hours speech spoken by more than 1500 native adult speakers.
Press Release about GlobalPhone, Magazin Byte.com October 1997
The goal of GALE is to develop and apply computer software technologies to absorb, analyze and interpret huge volumes of speech and text in multiple languages, eliminating the need for linguists and analysts and automatically providing relevant, distilled actionable information to military command and personnel in a timely fashion. Automatic processing "engines" will convert and distill the data, delivering pertinent, consolidated information in easy-to-understand forms to military personnel and monolingual English-speaking analysts in response to direct or implicit requests (mission statement from the official webpage).
In this program we develop technologies that enable robust spontaneous two-way tactical speech communications between American warfighters and native speakers. In this context we are investigating issues surrounding the rapid deployment of new languages, especially, low-resource languages and colloquial dialects. Currently we are working on two-way translation between English and Arabic Iraqi. TransTac builds on our existing speech translation technology. We collaborate with Mobile Technologies, LLC who builds small footprint speech recognition and machine translation systems, as well as with Cepstral, LLC who builds small footprint speech synthesis. In order to bring the whole system on a ruggedized device, we work with Marine Acoustics and its division VoxTec, which recently launched the P2 handheld PDA, the one-way translation device Phraselator .
Press Release about Wearable Devices, Carnegie Mellon's Focus Magazin, June 2003 (pdf),
Research in automatic Speaker Identification (SID) has mainly focused so far on recognizing people speaking on the phone. The developed systems and technologies deliver a very high performance when the the test situation matches the training conditions and when the noise level is low. Key problems such as recognizing speakers in the presence of noise, in mismatching test and training conditions, and in the far-field have not been addressed yet. When speech data is captured by microphones more than a few inches apart from the speaker, the room acoustics and noise sources, reverberation, and multi-path propagation become key concerns. The project FarSID aims to study the far-field effects on current state-of-the-art SID systems and investigate strategies to improve SID system performance in far-field scenarios. For more information on FarSID, please visit FarSID.
In the first phase of this project we developed a Thai Speech Translation System for Coalition Conversations in Military Field Applications and Civil Applications in Thai language. In the second phase we focus on the mobility and the robustness of the system. Target is a two-way speech translation system on mobile platforms with fixed point low processing power and limited memory resources that opperates robustly in medical dialogs between American doctors and Thai patients.
We recently started a cooperation with Dr. Denise Herzing who founded the Wild Dolphin Project in 1985 and since. The aim of this project is an ambitious long term scientific study of Atlantic spotted dolphins that live 40 miles off the coast of the Bahamas. Our goal is to build a system that records and plays sounds underwater and interprets the dolphins' responds.
ELLS was a research effort jointly funded by the US Department of Education and Chinese Ministry of Education between 2001 and 2003. I was involved in this project as a Technical Work Group Leader.
Maintainer: tanja@cs.cmu.edu.