Line Weavings for the Astral 40 Set --- Fall 2005 |
This link points to a tree of webpages, organized by the second and third characters of the protein PDB IDs, in a fairly standard way.
At the bottom of the hierarchy, each protein chain is represented
by two files: a crossing file and a backbone overlay
file.
For instance, chain A in protein
5rub associates to the crossing file
5rub_A.lad and the backbone overlay
file 5rub_A_sse.pdb.
The crossing file also shows the crossing matrix associated with the lines, the matrix of interline angles, the matrix of centroid separations, and the matrix of closest approach between secondary structures as determined by their constituent alpha and sidechain carbons. The crossing matrix has entries of the form "+", "-", or ".", specifying the sign of the interline angles. The entry "." appears for angles that lie below some threshold (0.5 radians).
Each secondary structure line appears in the file with an alpha carbon at its centroid, a beryllium atom at its start, and a sulphur atom at its end. A chain of hydrogen atoms connects these three points, giving the appearance of a line. The residue number associated with all these atoms is the index that appears for the secondary structure in the crossing file (in the Type column).
The protein chain appears with its original chain letter (or the letter A if the protein did not originally have a chain letter). The lines appear with chain letter Z (unless that conflicts with the protein chain letter, in which case the lines have chain letter A).
The overlay files should display fine in either
RASMOL
or
Protein Explorer.
The following two RASMOL scripts will display the file using
structural colors for the protein chain and green for the lines
(assuming the lines have chain letter Z):
Ribbon Script
Backbone Script
For coordinates, we used the protein files appearing on the PDB DVD set entitled "Release #1, 2004 Edition", as received 19-July-2005. For protein chains with multiple models we simply used the first model present in the PDB file. Similarly, for atoms with alternate locations we used the first location.
The residue numbers appearing in the crossing and overlay files are not necessarily the original PDB residue numbers. Instead, we ran DSSP over the PDB files, and used the internal DSSP residue numbers, reshifted to start at 1 for each protein chain. Doing so avoided issues with negative residue numbers and insertion codes.
We also determined secondary structure using DSSP. We augmented the DSSP information slightly (automatically, not by hand), by adding some turns to helices and breaking some strands at non-bridge locations. We also broke secondary structures that bent severely, in order to reduce the likelihood of poor line approximations. We ignored strands consisting of a single residue and helices consisting of fewer than three residues. Some protein chains have no associated secondary structure and thus no associated line weaving.
THANKS: We are very grateful to all those individuals and institutions who have created the sources listed above, and thank them for making the sources publicly available.
A word of caution: the line approximations derived for secondary structure elements consisting of only a few residues can be overly sensitive to the coordinates of their constituent alpha carbons. For instance, adding or removing even a single residue in a short helix, say one with fewer than 7 residues, may potentially shift the orientation of the line dramatically. (This is not surprising since 4 residues are required to define a single "center" of a helical axis, but it is worth remembering.)
Any opinions, findings, and conclusions or recommendations expressed in this research are those of the author(s) and do not necessarily reflect the views of Carnegie Mellon University, the Pennsylvania Department of Health, or any other private or governmental agency.
Permission is granted to any individual or institution to use, copy, and/or distribute this material, provided that the complete contents of this webpage, including but not limited to the disclaimer, copyright, and permission notice, are maintained, intact, in all copies and supporting documentation.
Modified 19-September-2005 by
Michael Erdmann