02-[46]14: String Algorithms (Fall 2020)

Tentative Schedule:

(Topics and order subject to change.)
TopicTopicsLinks & Reading
1 Exact string matching (Z-algorithm, Knuth-Morris-Pratt, Boyer-Moore, Rabin-Karp)
2 Inexact matching (edit distance, alignment in linear space, Four-Russians' speedup). Example: BLAST.
3 Suffix trees and arrays and their applications; Ukkonen's suffix tree construction algorithm; Burrows-Wheeler transform, whole genome alignment. Example: MUMmer.
4 Multiple sequence alignment; motif finding; multiple patterns. Example application: influenza phylogenies.
5 Compression and other compressed indices.
6 Hashing / randomization techniques for big data. Example application: Mash and mashmap.
  • de Bruijn graphs and de Bruijn sequences
  • Locality sensitive hashing for strings
  • Nearest neighbor search with locality sensitive hashing
  • Random projection for motif finding
  • Bloom filters and Sequence Bloom Trees
7 State machines. Example application: gene finding.
  • Regular expressions and FSAs
  • Context-free grammars, parsers
  • HMMs
8 String graphs. Example application: read alignment to a population.
  • Variant graphs
  • BWT for labeld trees and DAGs
  • BWT for de Bruijn graphs
9 Current research in "Big Genomics".
  • TBD