·
Heuristic Local search algorithm for
protein complex
Choices
for how to start: -
Interesting selected nodes; -
Top degree nodes -
All related nodes on the graph
searched on; We order the
seeding node list with respect to the node degree and start the search from
high degree nodes to low degree nodes. |
Expand
current cluster heuristic: -
Generate
a subset V* from all neighbours of current cluster; -
Choose
top rank M nodes as candidate to expand the current cluster. -
The
order of adding is based on the maximum similarity weight to the current
cluster; -
Choose
the node who achieves the best new cluster score among M candidates; |
Search:
-
Accept
the new cluster candidate if with higher score; -
If
lower, accept with probability exp(L' - L)/T; -
Temperature
parameter T decreasing by a scaling factor alpha after each round -
Accepted
cluster must score higher than a threshold |
When
to stop: -
Current
cluster has no neighbor nodes to expand; -
Number
of rounds since the last score improvements is larger than a specified
number; -
The
number of expanding rounds is larger a the specified number; |
·
Complexity of complex identification
Assuming that the graph we searched on have n nodes and e edges. The maximum size of sugraphs we detected is p (chose as 20 in our evaluations). Thus, l For each candidate sub-graph, we need to perform the feature extractions. This would cost O(p3) l When searching from a seeding node, we assume the average degree of nodes in the graph as q (in sparse graph, q<<n) and the cluster expansion in each round is constrained to m (set as 20 in the evaluations) maximum weight nodes. Then finding the related complex for the node would be cost O(q*p*m* p3) l If wanting to find all the complexes in a graph, we start from all the related n nodes, totally the algorithm cost O(n*q*m* p4) l Thus, if (n >> p4), polynomial to n. If not, the complexity controls largely by p |
·
Evaluation setting
There exist another large scale affinity-MS based data set from the following paper [1]:
|
In the evaluation, we either train with MIPS [5] manual complexes as positive and test with TAP06 complexes, Or train with TAP06 complexes as positive and test with MIPS manual complexes
|
Related
parameters used after selecting the best among evaluation
-
CLUSTER_LIMIT 20 -
CLSOVERLAPLIMIT 0.75 -
CHECKNEIULIM 20 -
SIMUTemp0 0.88 -
SIMUALPHA 1.80 -
trainPosPpara 0.0001 -
LikeScoreCutoff 0 |
·
Validation setting
Related parameters used -
trainvaliChoice mipsTrain.outVali -
trainMethodpara 0.0001 -
seedChoice allGraphNode -
CLUSTER_LIMIT 20 -
CLSOVERLAPLIMIT 0.8 -
CHECKNEIULIM 20 -
SIMUTemp 0.88 -
SIMUALPHA 1.80 -
LikeScoreCutoff 0 |
We checked our predicted clusters respect to
five complexes sets: -
MIPS manual (MIPS) [5] -
Gavin 02 [2] -
Gavin 06 (TAP06) [3] -
Ho 02 [4] -
Krogan 06 [1] |
l
References:
[1]
Krogan,N.J.,
et al, Greenblatt,J.F. (2006) Global landscape of protein complexes in yeast
Saccharomyces cerevisiae. Nature , 440, (7084):637-43.
[2]
Gavin,A.C.,
Bosche,M., Krause,R., Grandi,P., Marzioch,M., Bauer,A., Schultz,J., et.al.
(2002) Functional organization of the yeast proteome by systematic analysis of
protein complexes. Nature, 415, (6868):141-7.
[3]
Gavin,A.C.,
Aloy,P., et al. (2006) Proteome survey reveals modularity of the yeast cell
machinery. Nature, 440, (7084):631-6.
[4]
Ho,Y.,
Gruhler,A., Heilbut,A., Bader,GD., Moore,L., Adams,S.L., Millar,A.,
Taylor,P.,Bennett,K., Boutilier,K., et al. (2002) Systematic identification of
protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature,
415, 6868.
[5]
Mewes,H.W.,
Amid,C., Arnold,R., et. al (2004) MIPS: analysis and annotation of proteins
from whole genomes. Nucleic Acids Res. 32, (Database issue):D41-4.