Course Information:

Course Description: Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous amounts of collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Borrows-Wheeler transform. Applications of these techniques in genomics will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed; programming proficiency is required.

Pre-requisites:

Equivalent of 15-210 ("Parallel & Sequential Data Structures and Algorithms") or 02-513/02-713 ("Algorithms and Data Structures for Scientists").
Equivalent of 15-151 or 21-127.
Programming proficiency.

Computational Genomics (02-710) is not a pre-requisite. No prior knowledge of biology is assumed.

Instructor: Carl Kingsford
Professor, Computational Biology Department, School of Computer Science, Carnegie Mellon University.
carlk@cs.cmu.edu
Office: GHC 7719
Office Hours: TBD

TA: TBD, TBD@cs
Office Hours: TBD

Syllabus: Here is the syllabus.

Resources

Piazza for course discussions.

Gradescope for submitting written homeworks.

An optional LaTeX template for homeworks.

Additional resources will be linked as appropriate from the course schedule.

Useful textbooks:

Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein
Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing by Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, Alexandru I. Tomescu
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Gusfield
Algorithms on Strings by Crochemore