15-853: Algorithms in the Real World (Guy Blelloch)
Topic 8: Pattern Matching in Computational Biology
[ Topics |
Scribe Notes |
Readings |
Text Books |
Links ]
VLSI 3 + Pattern Matching 1 (draft) (Noah Treuhaft)
Pattern Matching 2 (draft) (Felix Wu)
Lecture notes
2,
3,
4,
and
6
from the class Algorithms in Molecular Biology by
Richard Karp, Larry Ruzzo, and Martin Tompa.
We will not be covering the exact same material, but there will be significant
overlap.
Suggested
Dan Gusfield. Algorithms on
String, Trees, and Sequences. Cambridge University Press,
1997.
This is an excellent book and the definitive source for
combinatorial algorithms on strings. It does not go into much detail
on the biology side and is somewhat narrow in the overall
computational biology. For example it has little on Gene prediction,
finding signals in DNA, on proteomics. In general it has very little
on statistical based techniques.
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison.
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic
Acids.
Cambridge University Press, 1998.
This is a good book on probabilistic methods for sequence analysis, but
a little weak on the computer science side.
It is complamentary to Gusfield's book.
Pavel A. Pevzner. Computational
Molecular Biology (An Algorithmic Approach) MIT Press,
2000. This book is is broader than either the Gusfield or Durbin
book on their own, but not as deep in either discrete string matching
algorithms (compared to Gusfield), or statistical methods (compared to
Durbin). If you just want just one book, this might be the right one.
Other
M.S. Waterman.
Introduction to Computational Biology: Maps, Sequences, Genomes.
Chapman & Hall, 1995.
Joćo Meidanis & Joćo Carlos Setubal.
Introduction to Computational Molecular Biology.
PWS Publishing Company, Boston, 1996.
Arthur M. Lesk.
Computational molecular biology:
sources and methods for sequence analysis.
Oxford University Press, 1988.
Graham A Stephen
String searching algorithms
World Scientific, 1994.
The technical report this book is based on is available online.
General information
Primer on
Molecular Genetics from the U.S. Department of Energy.
A nice overview in the general topic of molecular genetics.
Alignments, the basis for sequence comparison. A nice overview by Peter Rijk.
Protein Sequence Alignment and Database Scanning by Geoffere Barton. Yet another good overview paper on the subject (also available in postscript)
Combinatorial Methods for DNA Mapping and Sequencing.
Forward from a special issue of the Journal of Computational Biology.
Exact String Matching Algorithms. An overview by
Christian Charras and Thierry Lecroq. Has some nice applets.
The Biology Workbench.
Access to a collection of biological sequence and structure analysis tools
set up at NCSA.
You need to register to use it.
A guide to sequence searching. Brief description of BLAST, FAST and similar techniques.
Yahoo: Home > Science > Biology > Molecular Biology > Computational Biology
Actually a pretty worthless page.
Courses on computational biology
Algorithms for Molecular Biology
(Dalit Naor and Ron Shamir at Tel-Aviv University).
Representations and Algorithms for Computational Molecular
Biology
(Russ Altman at Stanford). This has more of a biological bent---the
instructors are biologists.
Algorithmic Fundamentals for Computational Molecular Biology
(Gene Myers at U. Arizona).
Computational Biology
(Larry Ruzzo, and Martin Tompa at U. Washington).
Introduction to Computational Molecular Biology
(Bonnie Berger and Mona Singh at MIT)
3-D Structure in Chemistry and Molecular Biology
(Klara Kedem, Paul Chew, Jon Kleinberg, and Dan Huttenlocher at Cornell).
This course mostly covers three dimensional matching of proteins,
and other molecules.
3-D Structure in Chemistry and Molecular Biology
(Bruce Donald at Dartmouth).
This is similar to the previous course (obviously, by name).
Bruce Donald was at Cornell before Dartmouth.
Computational Biology and Chemistry
(Sung-Hou Kim and John Canny at UC Berkeley).
Principles
of Computational Biology
(Steven Salzberg at Johns Hopkins)
Bibliographies
A bibliography on Computational Gene recognition.
Implementations
Sequence analysis software
An overview and rating of various sequence matching tools (and other biomolecular research tools).
A FPGA based implementation of sequence matching.
Claims to be 80x faster than a 300MHz Alpha.
Overview of the FASTA algorithm.
Human Genome Project
Home page
A Primer
on Molecular Genetics
Links to otheruseful information.
Evolutionary (phylogenic) trees
Lectures
17,
18,
and
19
from the course
Algorithms in Molecular Biology at U. Washington.
Notes are quite short but this is a good introduction.
Chapter 17 from D. Gusfield,
Algorithms on String, Trees, and Sequences. Computer theory viewpoint.
Chapter 11 from D. Hillis, C. Moritz, and B. Mable (Eds.),
Molecular systematics. Biology viewpoint.
A
glossary on evolution from UBC.
A collection of programs for constructin evolutionary trees from U. Washington.
Longest Common Subsequence
A nice description of the LCS problem and algorithms from a course
by David Eppstein.
Back to the Algorithms in the Real World home page.
Guy Blelloch,
guyb@cs.cmu.edu.