CS294-3: Algorithms in the Real World (Guy Blelloch, Fall 97)

next up previous
Topic 8: Pattern Matching in Computational Biology

[ Topics | Scribe Notes | Readings | Text Books | Links ]


Topic Outline

  • Introduction
  • DNA, proteins, and sequencing
  • Global alignment
  • The longest common subsequence (LCS) and minimum edit distance problems
  • The general alignment problem
  • Cost matrices -- how derived in biology
  • Recursive solution with memoizing
  • Dynamic programming solution
  • O(n+m) space solution (Hirshberg)
  • O(n d) time solution (Ukkonen)
  • Other alignment problems
  • Local alignments
  • Alignment with gaps
  • Affine gap model and algorithm (Gotoh)
  • Multiple alignment
  • Sequence analysis tools and search engines
  • BLAST
  • FASTA
  • Applications of alignment
  • Protein sequencing
  • Evolutionary (phylogenic) trees
  • Scribe Notes

  • VLSI 3 + Pattern Matching 1 (draft) (Noah Treuhaft)
  • Pattern Matching 2 (draft) (Felix Wu)
  • Readings

  • Lecture notes 2, 3, 4, and 6 from the class Algorithms in Molecular Biology by Richard Karp, Larry Ruzzo, and Martin Tompa.
    We will not be covering the exact same material, but there will be significant overlap.
  • Recommended Text Books

  • Dan Gusfield. Algorithms on String, Trees, and Sequences. Cambridge University Press, 1997.
  • Arthur M. Lesk. Computational molecular biology: sources and methods for sequence analysis. Oxford University Press, 1988.
  • Graham A Stephen String searching algorithms World Scientific, 1994.
    The technical report this book is based on is available online.
  • M.S. Waterman. Introduction to Computational Biology: Maps, Sequences, Genomes. Chapman & Hall, 1995.
  • Further Readings and Links

  • General information
  • A guide to sequence searching.
  • Alignments, the basis for sequence comparison. A nice overview by Peter Rijk.
  • Protein Sequence Alignment and Database Scanning by Geoffere Barton. This is a nice overview paper on the subject (also available in postscript)
  • Combinatorial Methods for DNA Mapping and Sequencing. Forward from a special issue of the Journal of Computational Biology.
  • Exact String Matching Algorithms. An overview by Christian Charras and Thierry Lecroq. Has some nice applets.
  • The Biology Workbench. Access to a collection of biological sequence and structure analysis tools set up at NCSA. You need to register to use it.
  • Courses on computational biology
  • Algorithmic Fundamentals for Computational Molecular Biology
    (Gene Myers at U. Arizona).
  • Algorithms in Molecular Biology
    (Richard Karp, Larry Ruzzo, and Martin Tompa at U. Washington).
  • Introduction to Computational Molecular Biology
    (Bonnie Berger and Mona Singh at MIT)
  • 3-D Structure in Chemistry and Molecular Biology
    (Klara Kedem, Paul Chew, Jon Kleinberg, and Dan Huttenlocher at Cornell). This course mostly covers three dimensional matching of proteins, and other molecules.
  • Computational Biology and Chemistry
    (Sung-Hou Kim and John Canny at UC Berkeley).
  • Algorithms for Molecular Biology
    (Dalit Naor and Ron Shamir at Tel-Aviv University).
  • Bibliographies
  • A bibliography on Computational Gene recognition.
  • Implementations
  • Sequence analysis software
  • An overview and rating of various sequence matching tools (and other biomolecular research tools).
  • A FPGA based implementation of sequence matching. Claims to be 80x faster than a 300MHz Alpha.
  • Overview of the FASTA algorithm.
  • Human Genome Project
  • Home page
  • A Primer on Molecular Genetics
  • Links to otheruseful information.
  • Evolutionary (phylogenic) trees
  • Lectures 17, 18, and 19 from the course Algorithms in Molecular Biology at U. Washington. Notes are quite short but this is a good introduction.
  • Chapter 17 from D. Gusfield, Algorithms on String, Trees, and Sequences. Computer theory viewpoint.
  • Chapter 11 from D. Hillis, C. Moritz, and B. Mable (Eds.), Molecular systematics. Biology viewpoint.
  • A glossary on evolution from UBC.
  • A collection of programs for constructin evolutionary trees from U. Washington.
  • Longest Common Subsequence
  • A nice description of the LCS problem and algorithms from a course by David Eppstein.

  • Back to the Algorithms in the Real World home page.
    Guy Blelloch, guyb@cs.berkeley.edu.