Computational Molecular Biology and Genomics Syllabus - Fall 2006


>
CLASS
DATE
TOPICS
ASSIGNED READING
ADDITIONAL TOPICS
1.  Aug. 29 Introduction to computational biology and genomics I

   
    Review biology and algorithms background  
2.  Aug. 31 Introduction to computational biology and genomics I I    
3.  Sept. 5  Global pairwise sequence alignment
Lecture outline
Alignment examples

PS0 due.
  • Global sequence alignment notes,
      courtesy Dr. M. Singh, Princeton University
  • Setubal and Meidanis, 47-55, 89-92, 96-98; (electronic reserve)
  • Durbin, pp. 17-22 (electronic reserves)
  • Saving space: Setubal and Meidanis, 58-60; (physical reserve)
  • General gap penalty functions: Setubal and Meidanis, 60-64 (physical reserve)
  • 4.  Sept. 7
  • Introduction to sequencing,  [Slides]  D. Durand
  • Genome Assemblies and Interval Graphs  [Slides]  M. Farach-Colton, Rutgers Univ.
    You can view these lectures online in Quicktime format.

  •    
    5.   Sept. 12 Local pairwise sequence alignment.   Semiglobal alignment.
    Lecture outline
    Alignment examples

  • Local sequence alignment notes,
      courtesy Dr. M. Singh, Princeton University
  • Setubal and Meidanis, 55-57, 64-66; (electronic reserve)
  • Durbin, pp. 23-24, 29-30 (electronic reserves)
  •  
    6.   Sept. 14 Global Multiple Sequence Alignment
    PS1 due.
  • Setubal and Meidanis, 69-72 (electronic reserve)
  • Multiple sequence alignment Notes: I and II,  courtesy Dr. M. Singh, Princeton University
  • Durbin, 6.1 -- 6.4(electronic reserves)
  • On the Design of Optimization Criteria for MSA, Durand & Farach-Colton, In Biological Evolution and Statistical Physics, Laessig & Valleriani
  • Strategies for multiple sequence alignment, Nicholas HB Jr, Ropelewski AJ, Deerfield DW 2nd, Biotechniques 2002 Mar;32(3):572-4 (electronic reserve)
  • 7.  Sept. 19   Global MSA, Introduction to Phylogenetic Trees

       
       
    8.   Sept. 21 Trees, cont'd. Maximum parsimony
    Durbin, et al: (electronic reserves)
    7.1, 7.2:  Background on trees
    7.4:  Parsimony
    Phylogeny notes,pp 1-3 & 17-20,
      Dr. M. Singh, Princeton University
    Parsimony, nice examples
  • Mount, pp 248-254(physical reserve)
  • Sankoff's algorithm for inferring ancestral sequences; in Inferring Phylogenies, J. Felsenstein, Sinauer, pp 13-16.
  • 9.   Sept. 26 Estimating distances btw sequences;   Probabilistic models of evolution (Jukes-Cantor);  Correcting for multiple substitutions. Lecture outline


      PS2 due.
    Markov Chain background
    Ewens and Grant, 4.4-4.8
    Durbin et al., 3.1 (electronic reserves)
    Probabilistic models of evolution
    Durbin, et al: 8.1, 8.2 (electronic reserves)
    Phylogeny notes, pp. 13 - 17,   courtesy Dr. M. Singh, Princeton University
     
    10. Sept. 28 Distance-based phylogeny reconstruction. Lecture outline
    Projects: Statement of interest due today.
    Distance-based methods
  • Durbin, et al: 7.3(electronic reserves)
  • Phylogeny notes, pp.4-13,
      courtesy Dr. M. Singh, Princeton University
  •  
    11. Oct. 3 UPGMA and Neighbor Joining Lecture outline
     
  • Durbin, et al: 7.8, NJ proof. (physical reserves)
  • 12. Oct. 5 Neighbor Joining, Minimum Evolution Lecture outline    
    13. Oct. 10 Class is canceled    
    14. Oct. 12 Maximum likelihood estimation;     Introduction to local MSA
    A PSSM for the WEIRD motif
    A PSSM with pseudocounts

  • PS3 due.
  • One page project statement due.
  • Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood., pp. 1-8
    J. P. Huelsenbeck and K. A. Crandall, Ann. Rev. Ecol. Syst. 1997, 28:437-66
    Complexity results:
  • On the Approximability of Numerical Taxonomy: (Fitting Distances by Tree Metrics), Agarwala et al. , (SODA '96) (electronic reserve)
  • Efficient Algorithms for Inverting Evolution, Farach and Kannan, (STOC '96)
  • 15. Oct. 17 Local MSA continued. Gibbs Sampler.
    Lecture outline
    Gibbs sampler
    Ewens and Grant, pp. 211-215.    Electronic reserve.
    Theoretical framework, convergence proofs
    Ewens and Grant, 10.5.2, Physical reserves.
    Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment, Lawrence et al., Science. 1993 262(5131):208-14.
    Explaining the Gibbs sampler, G. Casella & E. I. George, The American Statistician, 46:167-174, 1992

    Other motif discovery methods
    16. Oct. 19 Midterm Exam
    This exam is closed book. You may bring two pages (or one page, front and back) of your own notes.
       
    17. Oct. 24 Hidden Markov Models I
    Lecture notes
    Viterbi, Forward, Backward algorithms
    Durbin, pp 55 - 61.
    Ewens and Grant, pp. 329-332 Electronic reserves.
    Hidden Markov Models in Computational Biology: Applications to Protein Modeling,
    Krogh et al., JMB 235, pp 1501--1531,(1994).
    Available through electronic reserves.
    18. Oct. 26 Hidden Markov Models II
    Lecture notes
    Viterbi algorithm example
    Forward algorithm example

    Project proposal due.
    Viterbi, Forward, Backward algorithms
    Durbin, pp 55 - 61.
    Ewens and Grant, pp. 329-332 Electronic reserves.
     
    19. Oct. 31 Hidden Markov Models III
       Baum-Welch,
       HMM's for sequence motifs

    Lecture notes
    Profile HMMs
    Durbin, pp 100 - 113.
    Ewens andGrant, pp. 335-337 Electronic reserves.
    HMM topology: Durbin, pp 61-71 Electronic reserves.
    Parameter estimation, Baum-Welsh algorithm
    Durbin, pp 61-71
    Ewens and Grant, pp. 329-332 Electronic reserves.
    Multiple alignment using HMMs
    Ewens and Grant, pp. 337 - 339 Electronic reserves.
     
    20. Nov. 2 Hidden Markov Models IV
       Profile HMM's
       MSA using HMMs
    Lecture notes

       
    21. Nov. 7 Substitution Matrices
      PAM matrices

     Lecture notes
    Substitution matrices:
    Setubal and Meidanis, 80-84; (electronic reserve)
    Mount, pp 76-89; (electronic reserve)
    Durbin et al, pp 14-16 (electronic reserves)
    BLOSUM Matrices:
    Ewens and Grant, 6.5.2.
    Amino acid substitution matrices from protein blocks, Henikoff S, Henikoff JG., PNAS 89(22):10915-9, 1992 (electronic reserve)
     
    22. Nov. 9 Substitution Matrices
      BLOSUM matrices;
    Introduction to database searching

     Lecture notes
       
    23. Nov. 14 Database searching; BLAST Lecture notes

      BLAST home page

      BLAST Tutorial page  Recommended for students unfamiliar with BLAST

    PS4 due.
    Data Base Searching
    Mount, pp. 282-291 (electronic reserve)

    BLAST
    Setubal and Meidanis, 84-87 (electronic reserve)
    Basic local alignment search tool, Altschul et al. , J. Mol. Bio., 1990 (electronic reserve)
     
    24. Nov. 16 BLAST;
      Statistics of local, ungapped alignments.
      Introduction to information theory.
      Information content of alignments.
    Lecture notes
    Blast statistics and data base searching:
    The statistics of sequence similarity scores S. F. Altschul  

    Strategies for searching sequence databases, Nicholas HB Jr, Ropelewski AJ, Deerfield DW 2nd, Biotechniques 2002 Jun;28(6):1174-8 (electronic reserve)
    Two tutorial articles on information theory:
    Information Theory Primer
    T. D. Schneider
    Information Theory in Biology
    C. Adami
    Blast statistics:
    Amino acid substitution matrices from an information theoretic perspective S. F. Altschul, J. Mol. Bio., 219:555-565, 1991 (electronic reserve)
    A protein alignment scoring system sensitive at all evolutionary distances, S. F. Altschul, J. Mol. Evol., 36:290-300 , 1993 (electronic reserve)
    Statistical Methods in Bioinformatics, W. Ewens and G. Grant (Physical reserves)

    Other BLAST references
    25. Nov. 21 Gapped BLAST, Lecture notes


    BLOSUM80,   BLOSUM45
    Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Altschul et al., Nucleic Acids Research, 1997, pp. 3389 - 3394 (electronic reserve)

     
      Nov. 23  No class (Thanksgiving Holiday)    
    26. Nov. 28 Gene Finding I;   Lecture notes
    FCE's
  • Defining genes in the genomics era.
    Snyder and Gerstein, Science (2003) 300(5617):258-60.
  • Gene Discovery in DNA Sequences
    S. Salzberg, IEEE 1999 (electronic reserve)
  • A hidden Markov model that finds genes in E. coli DNA A. Krogh et al., NAR 1994 (electronic reserve)
  • Assessment of protein coding measures
    J.W. Fickett and C.S. Tung, NAR 1992 (electronic reserve)
  • Distinctive sequence features in protein coding genic non-coding, and intergenic human DNA R. Guigo and J.W. Fickett, JMB 1995 (electronic reserve)
  • 27. Nov. 30 Gene Finding II;   Lecture notes

  • Yeast rises again.
    S. Salzberg, Nature ( 2003) 423, 233-234
  • Prediction of Complete Gene Structures in Human Genomic DNA C. Burge and S. Karlin, JMB 1997 (electronic reserve)
  • Ewens and Grant, pp. 340-346.
  • Recent advances in gene structure prediction  M. Brent and R. Guigo, Curr Opin Struct Biol. 2004:264-72.

  • Evaluation of Gene Structure Prediction Programs M. Burset and R. Guigo, Genomics 1996 (electronic reserve)
  • 28. Dec. 5 No class.
          PS5 due in MI646 by 5pm.
       
    29. Dec. 7 Final Project Presentations    
      Dec. 18 Final Exam:   8:30am - 11:30am.
    This exam is closed book. You may bring two pages (or one page, front and back) of your own notes.
      Study questions  
    To view online lectures in Quicktime format, you will need to have within your browser the QuickTime plug-in, and select it as the player for all media files. You can download the QuickTime Movie player for a PC or Mac free of charge at: http://www.apple.com/quicktime/download/index.html.



    Return to course homepage
    Last modified: June 22, 2006.
    Maintained by Dannie Durand (durand@cs.cmu.edu).