Back to "Biological Language Modeling Seminar Topics"
Back to "Protein sequence analysis"
Identification of homology: conserved sequences, patterns, motifs
* "gappers" versus "blockers" (Reference: Ref_HigginsTaylor, Chapter 5)
* Methods of identification:
PSI-BLAST
PHI-BLAST
WU-BLAST
Hidden Markov Modeling
simple Markov chains
higher order Markov chains
interpolated Markov chains
selective Markov chains
profile hidden Markov chains
PROSITE
Gribskov's method
Support Vector Machines
vectors e.g. from Hidden Markov Layers
Neural Networks
"Motifind": vectors n-grams
* Databases derived by using these methods:
* Curated family databases:
- Prosite
- Prints
> www.bioinf.man.ac.uk/fingerPRINTScan/bin/atwood/SearchPrintsForm2.pl
- Pfam
> www.pfam.wustl.edu/hmmsearch.shtml
> www.sanger.ac.uk/Pfam/search.shtml
* Clustering databases:
- ProDom
> www.toulouse.inra.fr/prodom/doc/blast_form.html
- DOMO
>
- Protomap
>
- Prof_pat
> www.mgs.bionet.nsc.ru/mgsprojgrams/prof_pat
* Derived family databases:
- blocks
> www.blocks.fhcrc.org/blocks_search.html
- proclass
> www.nbrf.georgetown.edu/gfserver/genefind.html
* what type?
- SBASE