Back to "Biological Language Modeling Seminar Topics"

Back to "Protein structure prediction"

 

SCOP, structural classification of proteins

 

Levels of hierarchy:

1. Domain

2. Family = similarities in structure, sequence and function implies common evolutionary origin

3. superfamilies = similar structure and function, but evidence for evolutionary relationship is suggestive but not compelling

4. Fold = superfamilies with a common folding topology for at least a central portion of the structure

5. Class = a, b, a+b, a/b, multi domain proteins, membrane and cell surface proteins, small proteins without a or b:

  1. All alpha proteins
  2. All beta proteins 
  3. Alpha and beta proteins (a/b) 
    Mainly parallel beta sheets (beta-alpha-beta units)
  4. Alpha and beta proteins (a+b) 
    Mainly antiparallel beta sheets (segregated alpha and beta regions)
  5. Multi-domain proteins (alpha and beta)
    Folds consisting of two or more domains belonging to different classes
  6. Membrane and cell surface proteins and peptides 
    Does not include proteins in the immune system
  7. Small proteins 
    Usually dominated by metal ligand, heme, and/or disulfide bridges

          Others:

 

 

Definition on the SCOP website (taken from http://scop.mrc-lmb.cam.ac.uk/scop/intro.html on October 13, 2002):

 

what are the numbers of entries in each of these hierarchies? Where does the number of 7000 folds come from?

 

Current statistics (as of October 13, 2002):

SCOP: Structural Classification of Proteins. 1.59 release
15979 PDB Entries (1 March 2002). 39893 Domains. 30 Literature References
(excluding nucleic acids and theoretical models)


 

Class Number of folds Number of superfamilies
Number of families
All alpha proteins 151 252 393
All beta proteins 110 205 337
Alpha and beta proteins (a/b) 113 185 438
Alpha and beta proteins (a+b) 208 295 454
Multi-domain proteins 34 34 46
Membrane and cell surface proteins 12 19 31
Small proteins 58 83 128
Total 686 1073 1827