11-756 DESIGN AND IMPLEMENTATION OF SPEECH RECOGNITION SYSTEMS

DESIGN AND IMPLEMENTATION OF SPEECH RECOGNITION SYSTEMS

Instructor: Bhiksha Raj, co-instructed by Rita Singh and Mosur Ravishankar

COURSE NUMBER

ECE: 18799D

LTI: 11756

LTI students can also register for this course as a lab course

Credits:	12
Timings:	4:30 p.m. -- 5:50 p.m.
Days:	Mondays and Wednesdays
Location:	GHC 4102

Prerequisites:

Mandatory: Linear Algebra. Basic Probability Theory.

Recommended: Signal Processing.

Coding Skills: This course will require significant programming form the students. Students must be able to program fluently in at least one language (C, C++, Java, Python, LISP, Matlab are all acceptable).

This is a project-based course.

PROJECTS PAGE

Voice recognition systems invoke concepts from a variety of fields including speech production, algebra, probability and statistics, information theory, linguistics, and various aspects of computer science. Voice recognition has therefore largely been viewed as an advanced science, typically meant for students and researchers who possess the requisite background and motivation.

In this course we take an alternative approach. We present voice recognition systems through the perspective of a novice. Beginning from the very simple problem of matching two strings, we present the algorithms and techniques as a series of intuitive and logical increments, until we arrive at a fully functional continuous speech recognition system.

Following the philosophy that the best way to understand a topic is to work on it, the course will be project oriented, combining formal lectures with required hands-on work. Students will be required to work on a series of projects of increasing complexity. Each project will build on the previous project, such that the incremental complexity of projects will be minimal and eminently doable. At the end of the course, merely by completing the series of projects students would have built their own fully-functional speech recognition systems.

Grading will be based on project completion and presentation.

The first class will be on 19th Jan, Wednesday

Class 1 19 Jan 2011 Introduction Slides

Class 2 24 Jan 2011 Data capture. Slides

Class 3 26 Jan 2011 Feature Computation Slides

Class 4 31 Jan 2011 Dynamic programming for string alignment. Slides Assignment 2

Class 5 2 Feb 2011 Project presentations: Data capture and feature computation

Class 6 7 Feb 2011 Dynamic programming for speech recognition Slides

Class 7 9 Feb 2011 From templates to HMMs Slides

Class 8 14 Feb 2011 HMMs Slides

Class 9 16 Feb 2011 Project presentations. Assignment 3

Class 10 21 Feb 2011 HMMs continued from class 8 Slides

Class 11 23 Feb 2011 No class

Class 12 28 Feb 2011 Continuous speech Slides

Class 13 2 March 2011 Project presentations. Assignment 4

Class 14 14 Mar 2011 Grammars Slides

Class 15 16 Mar 2011 Backpointer table. Training from continuous speech. Slides

Class 16 21 Mar 2011 Project presentations. Assignment 5

Class 17 23 Mar 2011 Ngram models. Slides

Class 18 28 Mar 2011 Ngram Models 2 Slides

Class 19 30 Mar 2011 Class cancelled

Class 20 4 Apr 2011 Project Presentations Assignment 6

Class 21 6 Apr 2011 Subword Units Slides

Class 22 11 Apr 2011 State tying Slides Assignment 7

Class 26 25 Apr 2011 Adaptation Assignment 8


Class 1	19 Jan 2011	Introduction	Slides
Class 2	24 Jan 2011	Data capture.	Slides
Class 3	26 Jan 2011	Feature Computation	Slides
Class 4	31 Jan 2011	Dynamic programming for string alignment.	Slides	Assignment 2
Class 5	2 Feb 2011	Project presentations: Data capture and feature computation
Class 6	7 Feb 2011	Dynamic programming for speech recognition	Slides
Class 7	9 Feb 2011	From templates to HMMs	Slides
Class 8	14 Feb 2011	HMMs	Slides
Class 9	16 Feb 2011	Project presentations.	Assignment 3
Class 10	21 Feb 2011	HMMs continued from class 8	Slides
Class 11	23 Feb 2011	No class
Class 12	28 Feb 2011	Continuous speech	Slides
Class 13	2 March 2011	Project presentations.	Assignment 4
Class 14	14 Mar 2011	Grammars	Slides
Class 15	16 Mar 2011	Backpointer table. Training from continuous speech.	Slides
Class 16	21 Mar 2011	Project presentations.	Assignment 5
Class 17	23 Mar 2011	Ngram models.	Slides
Class 18	28 Mar 2011	Ngram Models 2	Slides
Class 19	30 Mar 2011	Class cancelled
Class 20	4 Apr 2011	Project Presentations	Assignment 6
Class 21	6 Apr 2011	Subword Units	Slides
Class 22	11 Apr 2011	State tying	Slides	Assignment 7
Class 26	25 Apr 2011	Adaptation	Assignment 8