Topics in Bioinformatics

Back to "Biological Language Modeling Seminar Topics"

Ab initio structure prediction

A protein conformation has to satisfy three conditions:

1. Only stereochemically allowed conformations of all residues are acceptable (=avoid steric clashes).

Model system: dialanine peptide

Rotation of the polypeptide chain is permitted around the N-Calpha (angle Phi) and Calpha-C (angle Psi) bonds (except Proline) and the peptide bond (angle omega), which is either trans in most cases (omega=180o) or cis (omega=0o) in rare cases, i.e. at Proline residues. These angles define the backbone conformation, and specific conformations are allowed, as described by the Ramachandran plot.

2. The folded state must be energetically favorable

The native state of globular protein is only 20-60 kJ mol-1 (5-15kcal/mol) more stable than the denatured state. This is the equivalent of one or two water-water hydrogen bonds. It is unclear why this is the case, because the stability of proteins can be increased by adding stabilizing contacts. The main problem in achieving the native state is the loss of conformational freedom (entropy reduction), when going from many unfolded to a single folded conformation. This process is therefore thermodynamically unfavorable. Why does it still occur? Because the loss in entropy arising from conformational restriction is compensated by an increase in entropy arising from the hydrophobic effect. The fact that native protein structures are more stable than unfolded protein by 1-2 H-bonds, means that 1-2 unsatisfied H-bonds in a protein can make the native state unstable.

3. The folded state must be tightly packed.

How tightly packed is the interior of a protein? In theory, relatively loose packing would ensure exclusion of water, since a <1.4Å radius (=size of water molecule) hole is acceptable. However, attraction between atoms (van der Waals forces) cause closer packing than theoretically required by the hydrophobic effect alone. Thus, a protein is like a jigsaw puzzle, except that the pieces in a jigsaw puzzle are rigid, while the side chains in proteins are dynamic and can adopt many conformations.

More details on requirements 2 and 3: The folded state must be energetically favorable and the folded state must be tightly packed.

Terms used in the evaluation of the energy of a conformation (see page 253 in chapter 5, Ref_Lesk for equations):

1. Bond stretching

2. Bond angle bend

3. deviations from planarity and enforcement of correct chirality

4. Torsion angle

5. van der Waals interactions

6. Hydrogen bonds

7. Electrostatics

8. solvent

=> set of conformational energy potentials that fine tune these parameter sets "Potential functions"

The potential functions satisfy necessary but not sufficient conditions for successful structure prediction. Multiple local minima cannot be distinguished reliably from the correct one on the basis of calculated conformational energies. (What does this mean in practice? How many possible structures are there as opposed to the real structures? How different are the structures?)

The importance of hydrophobicity.

More details to Requirement 1. Only stereochemically allowed conformations of all residues are acceptable (=avoid steric clashes).

Conformational search methods

1. systematic search algorithms:

conformational space is restricted (Ramachandran)

grid search: keep bond lengths and angles fixed, and rotate systematically through 360o with fixed increment (how large?)

combinatorial explosion:

number of conformations = Product over N bonds of 360/increment

representation as search tree

backtracking

depth-first search

exclude the sterically or energetically not allowed conformations - how much does this reduce the possible number?

Example: Pappu et al 2000, perspective Baldwin and Zimm, 2000

2. model-building methods:

to alleviate combinatorial explosion, use larger fragments "building blocks" because they are more restricted

substructure search algorithm

assumption: each fragment is conformationally independent of the others

are the small fragments covering the same conformations as the full protein?

can only analyze molecules for which fragments are available

How many different conformations have been found for a given fragment?

How many different fragments have the same conformation?

3. random search methods

varies the atomic Cartesian coordinates or the torsion angles randomly

4. distance geometry

describes the conformation other than by cartesian or internal coordinates, i.e. in terms of the distances between all pairs of atoms

5. simulation methods

Monte Carlo and molecular dynamics

Applications:

IBM blue gene project

LINUS