Back to "Biological Language Modeling Seminar Topics"

Back to "Protein sequence analysis"

 

Identification of homology: conserved sequences, patterns, motifs

* "gappers" versus "blockers" (Reference: Ref_HigginsTaylor, Chapter 5)

* Methods of identification:

Dynamic programming

FASTA

BLAST

    PSI-BLAST

    PHI-BLAST

    WU-BLAST

Hidden Markov Modeling

    simple Markov chains

    higher order Markov chains

    interpolated Markov chains

    selective Markov chains

    profile hidden Markov chains

PROSITE

    Gribskov's method

Support Vector Machines

    vectors e.g. from Hidden Markov Layers    

Neural Networks

     "Motifind": vectors n-grams 

n-grams

 

* Databases derived by using these methods:

* Curated family databases:

    - Prosite 

    > www.expasy.ch/prosite

    - Prints 

    > www.bioinf.man.ac.uk/fingerPRINTScan/bin/atwood/SearchPrintsForm2.pl

    - Pfam

    > www.pfam.wustl.edu/hmmsearch.shtml

    > www.sanger.ac.uk/Pfam/search.shtml

* Clustering databases: 

    - ProDom

    > www.toulouse.inra.fr/prodom/doc/blast_form.html

    - DOMO

    > 

    - Protomap

    > 

    - Prof_pat

    > www.mgs.bionet.nsc.ru/mgsprojgrams/prof_pat

* Derived family databases:

    - blocks

    > www.blocks.fhcrc.org/blocks_search.html

    - proclass

    > www.nbrf.georgetown.edu/gfserver/genefind.html

  * what type?

    - SBASE