Tanja Schultz's Home Page

Tanja Schultz

Full Professor at Karlsruhe Institute of Technology
In June 2007 I started a new lab in Karlsruhe, Germany.
Please refer to the webpage of the Cognitive Systems Lab which describes the research that I direct in the area of human-centered technologies and applications based on biosignals, i.e. capturing, recognizing, and interpreting signals such as speech, muscle, and brain activity.

Cognitive Systems Lab
Institute of Anthropomatics
Faculty of Computer Science
Karlsruhe Institute of Technology (KIT)
Adenauerring 4
76131 Karlsruhe
Germany
Phone: +49 721 608 6300
Fax: +49 721 608 6116
E-mail: tanja@ira.uka.de

Administrative Assistant:
Helga Scherer scherer@ira.uka.de
Phone: +49 721 608 6312

Research Assistant Professor
Language Technologies Institute, School of Computer Science

Carnegie Mellon University

407 South Craig Street
Office 201
Pittsburgh, PA 15213-3891
Phone: +1 412 268-4278
Fax: +1 412 268-5578
E-mail: tanja@cs.cmu.edu

Administrative Assistant:
Lisa Mauti lmauti@cs.cmu.edu
Phone: +1 412 268-5480

Associate Director
InterACT: International Center for Advanced Communication Technologies

Directions to 407 S Craig St Pittsburgh, PA 15213-3708, US, Picture of the Lab

[Research] [Publications, CV] [Projects] [Teaching] [Students] [Activities] [Other]

Rapid Deployment of Speech Processing Systems to new Languages and Domains
It is my belief that automatic speech recognition systems are the most natural front-end for applications which allow human communication across language and culture barriers. Therefore part of my research focuses on revealing techniques and algorithms to construct Human-human Communication and Human-machine Interaction applications that can robustly function in multilingual environments. This includes the rapid deployment of speech recognizers in new tasks and languages. The massive reduction of effort in terms of time and costs is necessary to speed up the development of recognizers in new tasks and languages. I believe that this is an essential prerequisite in order to make speech-driven applications attractive and available to the public and also extends to speakers of languages in which only few or no resources are available. My research along these lines includes Language Independent and Language Adaptive Acoustic Modeling and the development of Multilingual large vocabulary continuous speech recognition (LVCSR) systems. Ongoing projects in which I am pursuing this research topic are [SPICE], [GlobalPhone], [GALE], [TRANSTAC].
Speech Translation
My research efforts in the area of speech translation include domain unlimited speech translation [STR-DUST] and small footprint systems on mobile devices [TRANSTAC], [LASER ACTD]. The predecessor of TransTac was the Darpa program Babylon targeting the development of a two-way, natural language speech translation interface. See also the group restricted page resources . Other significant speech translation programs I was involved in are the long-term project VERBMOBIL funded by the Federal Ministry of Education, Science, Research and Technology (BMBF). Verbmobil was the largest project on Speech-to-Speech Translation focussing on translation systems for spontaenous speech in face-to-face dialog situations. The long-term goal was to investigate language technologies as well as its economical application and to prepare Germany for the next millenium to be a country with a critical mass of speech and language specialists in industry and science. The program ran from 1996-2000. Two other projects were NESPOLE! the first NSF-EU cooperation to build Speech-based E-commerce services (travel arrangements) over telephone in 4 languages, and an Arabic derivative, working on Egypt SR Egyptian Arabic. Further projects in the area of tourist assistance including navigation, sightseeing information, and speech translation, i.e. LingWear a wearable language support assistant. Also, check out C-STAR the Consortium for Speech Translation Advanced Research, founded by Alex Waibel.
Far-Field Speaker Identification
Research in automatic Speaker Identification (SID) has mainly focused so far on recognizing people speaking on the phone. The developed systems and technologies deliver a very high performance when the the test situation matches the training conditions and when the noise level is low. Key problems such as recognizing speakers in the presence of noise, in mismatching test and training conditions, and in the far-field have not been addressed yet. When speech data is captured by microphones more than a few inches apart from the speaker, the room acoustics and noise sources, reverberation, and multi-path propagation become key concerns. The project FarSID aims to study the far-field effects on current state-of-the-art SID systems and investigate strategies to improve SID system performance in far-field scenarios. For more information on FarSID, please visit FarSID.
Detection and Annotation of Disfluencies and Non-verbal Cues
Detect, identify, and annotate cues, such as language, accent, or speaker identity for the purpose of improving automatic speech recognition as well as speech translation. The most recent funded project in which these aspects of my research are pursued is [GALE].
Non-audible and Whispered Speech
EMG-based speech input methods, research on Whispered Speech using other than air-conduction microphones, and EEG-based input methods for User Mode detection. Check out my publication page and the list of useful links to research on Brain Computer Interfaces BCI.
Towards Communication with Dolphins
I am collaborating with the founder of the Wild Dolphin Project Dr. Denise Herzing to build an underwater signal processing system that can play, record, and interpret dolphins sounds. This project [Dolphins] is joint work with Alan W Black and Bob Frederking from Language Technologies Institute.

[Research] [Publications, CV] [Projects] [Teaching] [Students] [Activities] [Other]

Publications
Click HERE to search and download my publications or go directly by years:

Curriculum Vitae (CV)

[Research] [Publications, CV] [Projects] [Teaching] [Students] [Activities] [Other]

[Spice] Speech Processing: Interactive Creation and Evaluation Toolkit
Speech technology potentially allows everyone to participate in today's information revolution and can bridge the language barrier gap. Unfortunately, construction of speech processing systems requires significant resources. With some 4500-6000 languages in the world, traditionally speech processing is prohibitive to all but the most economically viable languages. In spite of recent improvements in speech processing, supporting new languages is a skilled job requiring significant effort from trained individuals. This project aims to overcome both limitations by providing innovative methods and tools for naive users to develop speech processing models, collect appropriate data to build these models, and evaluate the results allowing iterative improvements. By integrating speech recognition and synthesis technologies into an interactive language creation and evaluation toolkit usable by unskilled users, speech system generation will be revolutionized. Data and components for new languages will become available to everybody improving the mutual understanding and the educational and cultural exchange between the U.S. and other countries.
Press Release about Spice and Speech Translation at InterACT, Carnegie Mellon's Tartan Online News September 2005
[GlobalPhone] A Multilingual LVCSR and Database Collection in 18 Languages
Techniques to create multilingual acoustic models require multilingual speech and text databases that cover many languages and that are uniform across languages. GlobalPhone is an ongoing database collection that provides transcribed speech data for the development and evaluation of large speech processing systems in the most widespread languages of the world. GlobalPhone is designed to be uniform across languages with respect to the amount of text and audio data per language, the audio data quality (microphone, noise, channel), the collection scenario (task, setup, speaking style etc.), and the transcription conventions and supplies an excellent basis for research in the areas of (1) multilingual speech recognition, (2) rapid deployment of speech processing systems to new languages, (3) language and speaker identification tasks, (4) monolingual speech recognition in a large variety of languages, as well as (5) comparisons across major languages based on text and speech data. To date, the GlobalPhone corpus covers 18 languages Arabic, Bulgarian, Chinese (Mandarin and Shanghainese), Croatian, Czech, French, German, Japanese, Korean, Portuguese, Polish, Russian, Spanish, Swedish, Tamil, Thai, and Turkish. In each language about 100 adult speakers were recorded with close-speaking microphones when reading about 100 sentences each. The entire corpus contains over 300 hours speech spoken by more than 1500 native adult speakers.
Press Release about GlobalPhone, Magazin Byte.com October 1997
[Gale] Global Autonomous Language Exploitation
The goal of GALE is to develop and apply computer software technologies to absorb, analyze and interpret huge volumes of speech and text in multiple languages, eliminating the need for linguists and analysts and automatically providing relevant, distilled actionable information to military command and personnel in a timely fashion. Automatic processing "engines" will convert and distill the data, delivering pertinent, consolidated information in easy-to-understand forms to military personnel and monolingual English-speaking analysts in response to direct or implicit requests (mission statement from the official webpage).
[TransTac] Spoken Language Communication and Translation System for Tactical Use
In this program we develop technologies that enable robust spontaneous two-way tactical speech communications between American warfighters and native speakers. In this context we are investigating issues surrounding the rapid deployment of new languages, especially, low-resource languages and colloquial dialects. Currently we are working on two-way translation between English and Arabic Iraqi. TransTac builds on our existing speech translation technology. We collaborate with Mobile Technologies, LLC who builds small footprint speech recognition and machine translation systems, as well as with Cepstral, LLC who builds small footprint speech synthesis. In order to bring the whole system on a ruggedized device, we work with Marine Acoustics and its division VoxTec, which recently launched the P2 handheld PDA, the one-way translation device Phraselator .
Press Release about Wearable Devices, Carnegie Mellon's Focus Magazin, June 2003 (pdf), The complete archived magazin, June 2003
[FarSID] Far-Field Speaker Identification
Research in automatic Speaker Identification (SID) has mainly focused so far on recognizing people speaking on the phone. The developed systems and technologies deliver a very high performance when the the test situation matches the training conditions and when the noise level is low. Key problems such as recognizing speakers in the presence of noise, in mismatching test and training conditions, and in the far-field have not been addressed yet. When speech data is captured by microphones more than a few inches apart from the speaker, the room acoustics and noise sources, reverberation, and multi-path propagation become key concerns. The project FarSID aims to study the far-field effects on current state-of-the-art SID systems and investigate strategies to improve SID system performance in far-field scenarios. For more information on FarSID, please visit FarSID.
[Laser ACTD] Thai Speech-to-Speech Translation
In the first phase of this project we developed a Thai Speech Translation System for Coalition Conversations in Military Field Applications and Civil Applications in Thai language. In the second phase we focus on the mobility and the robustness of the system. Target is a two-way speech translation system on mobile platforms with fixed point low processing power and limited memory resources that opperates robustly in medical dialogs between American doctors and Thai patients.
[Dolphins] Towards Communication with Dolphins
We recently started a cooperation with Dr. Denise Herzing who founded the Wild Dolphin Project in 1985 and since. The aim of this project is an ambitious long term scientific study of Atlantic spotted dolphins that live 40 miles off the coast of the Bahamas. Our goal is to build a system that records and plays sounds underwater and interprets the dolphins' responds.
[ELLS] E-Language Learning System
ELLS was a research effort jointly funded by the US Department of Education and Chinese Ministry of Education between 2001 and 2003. I was involved in this project as a Technical Work Group Leader.

[Research] [Publications, CV] [Projects] [Teaching] [Students] [Activities] [Other]

Ongoing (since April 2003): Joint Speech Seminar
Spring 2007: 11-733/11-735 Multilingual Speech-to-Speech Translation Lab (6 unit Lab)
Fall 2006: 11-751 Speech Recognition and Understanding (12 unit core course)
11-751 Speech Recognition and Understanding: Fall 2005 , Fall 2004, Fall 2003, Fall 2002, Fall 2001, Fall 2000
11-733 Multilingual Speech-to-Speech Translation Seminar: Spring 2005, Fall 2003,
11-753 Advanced Lab in Speech Recognition and Understanding: Fall 2005, Spring 2003, Spring 2001
SPREAP - Speech Reading Group: November 2000-2001

Click here to search and download (co-)advised theses.

Current PostDoc Students

Ian Lane (Fall 2006)
Mark Fuhs (since Spring 2007)
Qin Jin (since Spring 2007)

Current PhD Students

Szu-Chen (Stan) Jou (Co-advised with Alex Waibel) (since Fall 2002)
Kornel Laskowski (since Fall 2002)
Wilson Yik-Cheung Tam (since Fall 2004)
Paisarn Charoenpornsawat (since Fall 2006)
Roger W Hsiao (since Fall 2007)
Mohamed Noamany

Current PhDs - Thesis Committee

John Kominek
Yue Pan (2006, Nuance)
Rong Zhang

Current Masters Students

James Sanders (since Summer 2006, 2nd year MLT, LOA)
Sameer Badaskar (since Fall 2006)
Udhyakumar Nallasamy (since Spring 2007)

Current InterACT Masters Students (co-advised in Germany)

Kristina Schaaff
Michael Wand

Graduated PhDs - Advisor

Qin Jin (Co-advised with Alex Waibel) (Jan 2007, InterACT postdoc)

Graduated PhDs - Thesis Committee

Fei Huang (2006, IBM)
Hua Yu (2005, Google, Inc.)
Laura Tomokiyo Mayfield (2001, Cepstral, LLC)

Graduated Masters

Roger Hsiao (2007, LTI PhD)
Sharath Rao (2007, Yahoo!)
Yunghui Li (2006, LTI)
Shirin Saleem (2005, BBN Technologies, Inc.)
Sinaporn Suebvisai (2004)
Zhirong Wang (2002)

Graduated InterACT Masters Students (co-advised in Germany, since 2001)

Jan Callies (2006, not graduated yet, UKA)
Marek Wester (2006)
Aneliya Mircheva (2006, UKA)
Lena Maier-Hein (2005, DKFZ Heidelberg, Germany)
Matthias Honal (2005, University Hospital Freiburg, Germany)
Michael Katzenmaier (2004)
Matthias Paulik (2004, InterACT)
Mirjam Killer (2003)
Sebastian Stüker (2003, InterACT)
Jamal Abu-Alwan (2001)

[Research] [Publications, CV] [Projects] [Teaching] [Students] [Activities] [Other]

Professional Activities

Conference Organizations and Committee Memberships
- General Co-Chair for Interspeech 2006 to be held in Pittsburgh, Pennsylvania
- Program Committee for Workshop at HLT 2006 on Medical Speech Translation
- Program Committee for Workshop at ESSLI on Resource-Scarce Language Engineering, July 2006
- Program Co-Chair for IWSLT 2005
- Technical Committee, IJCAI 2005
- Session Chair for "General Topics in Automatic Speech Recognition", ICASSP 2005
- Technical Committee, ICASSP 2005
- Member of the permanent council of the Interspeech/ICSLP (PC-ICSLP, since 2004)
- Program Committee, ICSLP 2004
- Session Chair for "Topics in Speaker and Language Recognition", ICASSP 2004
- Organizer and Chair of the Special Session "Multilingual Speech Processing", ICSLP 2004
- Technical Committee, ICASSP 2004
- Session Chair, ICASSP 2003
- Program Committee, International Conference on Multimodal Interfaces ICMI 2002
- Technical Committee, Human Language Technologies HLT 2001
Reviewing
- Editorial Board Member for Speech Communication, since 2004
- Associate Editor for the IEEE Transactions on Speech and Audio Processing (2002-2004)
- Journal of Artificial Intelligence Research, since 2005
- Computer, Speech and Language Journal, since 2004
- Iranian Journal of Electrical and Computer Engineering, since 2004
- South-African Computer Journal, since 2004
- Speech Communication, since 1999
- IEEE Transactions on Speech and Audio Processing, since 1999
At Carnegie Mellon
- Member of Admissions Committee, 2001

[Research] [Publications, CV] [Projects] [Teaching] [Students] [Activities] [Other]

Other Activities

JRTk IBIS documentation online or portable as pdf

The InterACT Agenda

Last modified: Mon Nov 27 8:47:45 EDT 2006
Maintainer: tanja@cs.cmu.edu.