Validate Module |
(Note, the results of the validate module within our submitted paper is our very preliminary results. Here, we have extended the validating to all possible pairs related with the pathway. We also try to predict to newly found interacting pairs. )
To analyze the results for their utility in the design of new experiments, we compared the predictions by our method to their labels for one specific pathway, the yeast pheromone pathway. The yeast mating factors MATa/a bind to their cognate membrane receptors, Ste2/3, members of the G protein coupled receptor family. Subsequent binding and activation of the G protein induces a MAP kinase signaling pathway via the G protein bg subunit. We get this pathway’s interacting network graph (Figure 1.) from “KEGG PATHWAY Database” (within MAPK signaling pathway - Saccharomyces cerevisiae). The pathway “Pheromone” line relates with 25 proteins. They are listed in the following table 1.
Table1.
“Pheromone” related 25 yeast proteins
No. |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
ORF Name |
YPL187W |
YGL089C |
YFL026W |
YDR461W |
YNL145W |
YKL178C |
YHR005C |
YOR212W |
YJR086W |
Gene Name |
|
|
|
|
|
|
|
|
|
No. |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
ORF Name |
YLR229C |
YAL041W |
YBR200W |
YHL007C |
YNL271C |
YLR362W |
YDR103W |
YDL159W |
YBL016W |
Gene Name |
|
|
|
|
|
|
|
|
|
No. |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
|
|
ORF Name |
YNL053W |
YPL049C |
YDR480W |
YHR084W |
YJL157C |
YMR043W |
YCL027W |
|
|
Gene Name |
|
|
|
|
|
|
|
|
|
FigureS1. “Pheromone” Pathway Network Graph on KeGG
For these 25 proteins, there are 300 (25*24) potential protein pairs for us to test their interacting probability. We used our method to predict if each these potential pair interacts or not. We also compared these pairs with our small positive label set and the KeGG pair interacting relationships within Figure 1 above. Please see the Excel table containing all 300 pairs’ involved protein ORF name, gene name, our prediction, the small_POS set label, and the KeGG interacting label. Here we also present the performance statistics in Table 2. For these 300 pairs, 11 of them are in labeled as positive pairs by both our pos-small labeled set and the KeGG pathway. For these 11 pairs, we predicted 8 of them as positive correctly. For the other cases, the table2 has clear performance numbers. For example, from the following numbers, we can derive and get that within the 19 (11 + 8 ) KeGG labeled nteracting pairs, our prediction correctly found 12 ( 8+4 ) of them (63.16%).
Table 2. Some
statistics about our prediction
|
Test Set |
Our Predicted Positive |
Our Predicted Negative |
Whole |
300 |
44 / 300 |
256 / 300 |
Labeled POS by both pos_small
& KeGG |
11 /300 |
8 / 44 |
3 / 256 |
Labeled only by pos_small |
31 /300 |
13 / 44 |
18 / 256 |
Labeled only by KeGG |
8 / 300 |
4 / 44 |
4 / 256 |
Remaining |
251 / 300 |
19 (need further analysis) / 44 |
231 / 300 |
(Note, within the table, “Pos” mean positive interacting pairs. For the protein pairs within 300 that are also in our current data set, we use the subset that does not contain these pairs to train the model and get the final prediction. The threshold is set by a validation set as described within the paper experimental section.)
For the table 2, we know that there are 19 our predicted pairs that can not be found within both the small positive set and also the KeGG pathway. The following table 3 lists these remaining 19 pairs. We made further studies for all these pairs (Detailed Comment)
Table 3. The new found 19 proteins pairs: Only 2 predictions are
clearly incorrect, but are understandable (lines 1+2). 6 predictions are
clearly correct, as confirmed by scientific literature. There are 11 new
predictions, all of which are reasonable and would functionally make sense.
They form two clusters, one is the possible interaction between STE5 anchor
protein with the receptors. The receptor would then make additional
interactions due to STE5 anchoring function. The second cluster is a possible
interaction between the most downstream component of the signaling cascade and
BNI1/BEM1.
No |
ORF1 |
Gene1 |
ORF2 |
Gene2 |
Pair |
Verdict |
Detailed comment |
1 |
YFL026W |
STE2 |
YNL145W |
MFA2* |
Wrong, but
understandable |
Ste2 is the
GPCR binding to MATalpha, while Mfa2 is the ligand for Ste3, the MATa
receptor*, so this is wrong, but should be very difficult to distinguish
automatically. |
|
2 |
YDR461W |
MFA1* |
YNL145W |
MFA2 |
Wrong, but
understandable |
Mfa1 and
Mfa2 are two ligands binding to the two GPCRs. They probably do not
physically interact with each other, but everything about them should be very
similar. Again, all the indirect evidence should point at their interaction. |
|
3 |
YNL145W |
MFA2 |
YHR005C |
GPA1 |
correct |
GPA1 is
the G protein. MFA2 is bound to the G protein coupled receptor, and it is the
complex between the ligand and the receptor that interactions with the G
protein. This prediction is correct. |
|
4 |
YNL145W |
MFA2 |
YDR103W |
STE5** |
New
prediction |
STE5 is a
protein that interacts with many other proteins and appears to form an anchor
role. Any interactions involving STE5 should be difficult to dissect
automatically (see discussion in paper). Further experiments are needed to
test if this prediction is correct. It would make sense to anchor members of
the signaling pathway via STE5 to the membrane via the receptors (and MFA as
a ligand is part of the receptor complex, see below line 6). |
|
5 |
YKL178C |
STE3 |
YJR086W |
STE18 |
Correct |
STE18 is a
subunit of the G protein. This prediction is correct. |
|
6 |
YKL178C |
STE3 |
YDR103W |
STE5 |
New
prediction |
See line 4.
The interaction between the STE5 anchor on the receptors would make sense. |
|
7 |
YHR005C |
GPA1 |
YDR103W |
STE5 |
New
prediction |
See lines 4,6,7.
If STE5 is anchored to the receptor, then it would also interact with the G
protein. |
|
8 |
YLR229C |
CDC42 |
YDL159W |
STE7 |
New prediction |
CDC42 is a
small G protein of Rho superfamily. STE7 is a kinase downstream of CDC42 and
interacts with STE5. It is possible that these interact, maybe as part of the
Ste5 complex. |
|
9 |
YLR229C |
CDC42 |
YPL049C |
DIG1 |
New
prediction |
See line
8, DIG1 is further downstream. |
|
10 |
YAL041W |
CDC24 |
YHL007C |
STE20 |
Correct |
Cdc24 is
the GDP-GTP exchange factor for Cdc42. Cdc42 interacts with Ste20. These
proteins can form a ternary complex. |
|
11 |
YAL041W |
CDC24 |
YLR362W |
STE11 |
New
prediction |
Ste11 is
part of STE5 complex, and CDC42 is predicted to interact with its direct partners.
See line 8 above. |
|
12 |
YAL041W |
CDC24 |
YDR103W |
STE5 |
New
prediction |
Directly
supports line 8. |
|
13 |
YAL041W |
CDC24 |
YDL159W |
STE7 |
New
prediction |
See lines
8-12 above. |
|
14 |
YBR200W |
BEM1 |
YNL271C |
BNI1 |
Correct |
BEM1 is
SH3-domain protein that binds CDC42/CDC24 and BNI1 binds CDC42. |
|
15 |
YBR200W |
BEM1 |
YCL027W |
FUS1 |
New
prediction |
FUS1 is a
cell surface protein that mediates cell fusion. It is one of the most
downstream factors in the pathway. |
|
16 |
YHL007C |
STE20 |
YDL159W |
STE7 |
New
prediction |
See lines
8-13. |
|
17 |
YNL271C |
BNI1 |
YCL027W |
FUS1 |
New
prediction |
See line
15, above. Consistent with line 14: BEM1&BNI1 interact, so if line 15 is
correct, then line 17 is correct also. |
|
18 |
YPL049C |
DIG1 |
YDR480W |
DIG2 |
Correct |
see e.g.
Kusari et al (2004) J. Cell Biol. 164, 267-277. |
|
19 |
YDR461W |
MFA1 |
YHR005C |
GPA1 |
Correct |
Same
argument as line 3. |
*MATa = MFA1 (precursor) and MFA2 (lipopeptide hormone) è Ste3 is the receptor
MATalpha = MF(alpha)1 and MF(alpha)2 è Ste2 is the receptor
**STE5 is the anchor protein, described in the discussion of the paper
So from the table 2 and table 3, we got to know that:
¡ For the 19 newly predicted pairs we got
l 2 of them found wrongly predicted
l 6 of them further verified correct predicted by Judith
l The remaining 11 new predictions are reasonable and would functionally make sense. We are currently trying to validate the above new predictions experimentally.
¡ So for correctness, among the 44 interacting protein pairs we predicted
l 31 out of 44 are correct (70.45%)
l 2 out of 44 are incorrect (4.55%)
l 11 out of 44 form new hypothesis that need further lab experimental verification (25%)
¡ So for completeness,
l We do not know the complete interact pair information
l We missed 25 pairs (3 + 18 + 4)
l If assume 42+25 is the total true pair (62.7% recall)
l If assume 31+25 is the total true pair ( 55.4% recall)
We have used our method on all possible yeast protein pairs to predict if they interact or not. For 6270 yeast proteins, there are 19653315 possible protein pairs overall. Use a default threshold (-0.2), we predict 26301 pairs of them to interact.
The web service would is just published online at: http://www.medstory.net:8081/ppiws/ . People can either search a certain yeast protein pair’ interacting confidence or search a protein’s interacting partners by setting some confidence threshold.