Back to "Biological Language Modeling Seminar Topics"
Back to "Protein structure prediction"
SCOP, structural classification of proteins
Levels of hierarchy:
1. Domain
2. Family = similarities in structure, sequence and function implies common evolutionary origin
3. superfamilies = similar structure and function, but evidence for evolutionary relationship is suggestive but not compelling
4. Fold = superfamilies with a common folding topology for at least a central portion of the structure
5. Class = a, b, a+b, a/b, multi domain proteins, membrane and cell surface proteins, small proteins without a or b:
- All alpha proteins
- All beta proteins
- Alpha and beta proteins (a/b)
Mainly parallel beta sheets (beta-alpha-beta units)- Alpha and beta proteins (a+b)
Mainly antiparallel beta sheets (segregated alpha and beta regions)- Multi-domain proteins (alpha and beta)
Folds consisting of two or more domains belonging to different classes- Membrane and cell surface proteins and peptides
Does not include proteins in the immune system- Small proteins
Usually dominated by metal ligand, heme, and/or disulfide bridges
Others:
- Coiled coil proteins
Not a true class- Low resolution protein structures
Not a true class- Peptides
Peptides and fragments. Not a true class- Designed proteins
Experimental structures of proteins with essentially non-natural sequences. Not a true class
Definition on the SCOP website (taken from http://scop.mrc-lmb.cam.ac.uk/scop/intro.html on October 13, 2002):
Family: Clear evolutionarily relationship
Proteins clustered together into families are clearly evolutionarily
related. Generally, this means that pairwise residue identities between the
proteins are 30% and greater. However, in some cases similar functions and
structures provide definitive evidence of common descent in the absense of
high sequence identity; for example, many globins form a family though some
members have sequence identities of only 15%.
Superfamily: Probable common evolutionary origin
Proteins that have low sequence identities, but whose structural and
functional features suggest that a common evolutionary origin is probable
are placed together in superfamilies. For example, actin, the ATPase domain
of the heat shock protein, and hexakinase together form a superfamily.
Fold: Major structural similarity
Proteins are defined as having a common fold if they have the same major
secondary structures in the same arrangement and with the same topological
connections. Different proteins with the same fold often have peripheral
elements of secondary structure and turn regions that differ in size and
conformation. In some cases, these differing peripheral regions may comprise
half the structure. Proteins placed together in the same fold category may
not have a common evolutionary origin: the structural similarities could
arise just from the physics and chemistry of proteins favoring certain
packing arrangements and chain topologies.
what are the numbers of entries in each of these hierarchies? Where does the number of 7000 folds come from?
Current statistics (as of October 13, 2002):
SCOP: Structural Classification of Proteins. 1.59 release
| |||
Class | Number of folds | Number of superfamilies |
Number of families |
---|---|---|---|
All alpha proteins | 151 | 252 | 393 |
All beta proteins | 110 | 205 | 337 |
Alpha and beta proteins (a/b) | 113 | 185 | 438 |
Alpha and beta proteins (a+b) | 208 | 295 | 454 |
Multi-domain proteins | 34 | 34 | 46 |
Membrane and cell surface proteins | 12 | 19 | 31 |
Small proteins | 58 | 83 | 128 |
Total | 686 | 1073 | 1827 |