Validate Module


 

(Note, the results of the validate module within our submitted paper is our very preliminary results. Here, we have extended the validating to all possible pairs related with the pathway. We also try to predict to newly found interacting pairs. )

 

Yeast Pheromone Pathway

            To analyze the results for their utility in the design of new experiments, we compared the predictions by our method to their labels for one specific pathway, the yeast pheromone pathway. The yeast mating factors MATa/a bind to their cognate membrane receptors, Ste2/3, members of the G protein coupled receptor family. Subsequent binding and activation of the G protein induces a MAP kinase signaling pathway via the G protein bg subunit. We get this pathway’s interacting network graph (Figure 1.) from “KEGG PATHWAY Database” (within MAPK signaling pathway - Saccharomyces cerevisiae). The pathway “Pheromone” line relates with 25 proteins. They are listed in the following table 1.

 

Table1. “Pheromone” related 25 yeast proteins

No.

1

2

3

4

5

6

7

8

9

ORF Name

YPL187W

YGL089C

YFL026W

YDR461W

YNL145W

YKL178C

YHR005C

YOR212W

YJR086W

Gene Name

MF(ALPHA)1

MF(ALPHA)2

STE2

MFA1

MFA2

STE3

GPA1

STE4

STE18

No.

10

11

12

13

14

15

16

17

18

ORF Name

YLR229C

YAL041W

YBR200W

YHL007C

YNL271C

YLR362W

YDR103W

YDL159W

YBL016W

Gene Name

CDC42

CDC24

BEM1

STE20

BNI1

STE11

STE5

STE7

FUS3

No.

19

20

21

22

23

24

25

 

 

ORF Name

YNL053W

YPL049C

YDR480W

YHR084W

YJL157C

YMR043W

YCL027W

 

 

Gene Name

MSG5

DIG1

DIG2

STE12

FAR1

MCM1

FUS1

 

 

 

 

 

FigureS1. “Pheromone” Pathway Network Graph on KeGG

                                      

 

 

Prediction on all the potential 300 proteins

 

            For these 25 proteins, there are 300 (25*24) potential protein pairs for us to test their interacting probability. We used our method to predict if each these potential pair interacts or not. We also compared these pairs with our small positive label set and the KeGG pair interacting relationships within Figure 1 above. Please see the Excel table containing all 300 pairs’ involved protein ORF name, gene name, our prediction, the small_POS set label, and the KeGG interacting label. Here we also present the performance statistics in Table 2. For these 300 pairs, 11 of them are in labeled as positive pairs by both our pos-small labeled set and the KeGG pathway. For these 11 pairs, we predicted 8 of them as positive correctly. For the other cases, the table2 has clear performance numbers. For example, from the following numbers, we can derive and get that within the 19 (11 + 8 ) KeGG labeled nteracting pairs, our prediction correctly found 12 ( 8+4 ) of them (63.16%).

 

Table 2. Some statistics about our prediction

 

Test Set

Our Predicted Positive

Our Predicted Negative

Whole

300

44 / 300

256 / 300

Labeled POS by both pos_small & KeGG

11 /300

8 / 44

3 / 256

Labeled only by pos_small

31 /300

13 / 44

18 / 256

Labeled only by KeGG

8 / 300

4 / 44

4 / 256

Remaining

251 / 300

19 (need further analysis) / 44

231 / 300

 

 

            (Note, within the table, “Pos” mean positive interacting pairs. For the protein pairs within 300 that are also in our current data set, we use the subset that does not contain these pairs to train the model and get the final prediction. The threshold is set by a validation set as described within the paper experimental section.)

 

 

 

Analysis for 19 newly predicted proteins

            For the table 2, we know that there are 19 our predicted pairs that can not be found within both the small positive set and also the KeGG pathway. The following table 3 lists these remaining 19 pairs. We made further studies for all these pairs (Detailed Comment)

 

Table 3. The new found 19 proteins pairs: Only 2 predictions are clearly incorrect, but are understandable (lines 1+2). 6 predictions are clearly correct, as confirmed by scientific literature. There are 11 new predictions, all of which are reasonable and would functionally make sense. They form two clusters, one is the possible interaction between STE5 anchor protein with the receptors. The receptor would then make additional interactions due to STE5 anchoring function. The second cluster is a possible interaction between the most downstream component of the signaling cascade and BNI1/BEM1.

 

No

ORF1

Gene1

ORF2

Gene2

Pair

Verdict

Detailed comment

1

YFL026W

STE2

YNL145W

MFA2*

Description

Wrong, but understandable

Ste2 is the GPCR binding to MATalpha, while Mfa2 is the ligand for Ste3, the MATa receptor*, so this is wrong, but should be very difficult to distinguish automatically.

2

YDR461W

MFA1*

YNL145W

MFA2

Description

Wrong, but understandable

Mfa1 and Mfa2 are two ligands binding to the two GPCRs. They probably do not physically interact with each other, but everything about them should be very similar. Again, all the indirect evidence should point at their interaction.

3

YNL145W

MFA2

YHR005C

GPA1

Description

correct

GPA1 is the G protein. MFA2 is bound to the G protein coupled receptor, and it is the complex between the ligand and the receptor that interactions with the G protein. This prediction is correct.

4

YNL145W

MFA2

YDR103W

STE5**

Description

New prediction

STE5 is a protein that interacts with many other proteins and appears to form an anchor role. Any interactions involving STE5 should be difficult to dissect automatically (see discussion in paper). Further experiments are needed to test if this prediction is correct. It would make sense to anchor members of the signaling pathway via STE5 to the membrane via the receptors (and MFA as a ligand is part of the receptor complex, see below line 6).

5

YKL178C

STE3

YJR086W

STE18

Description

Correct

STE18 is a subunit of the G protein. This prediction is correct.

6

YKL178C

STE3

YDR103W

STE5

Description

New prediction

See line 4. The interaction between the STE5 anchor on the receptors would make sense.

7

YHR005C

GPA1

YDR103W

STE5

Description

New prediction

See lines 4,6,7. If STE5 is anchored to the receptor, then it would also interact with the G protein.

8

YLR229C

CDC42

YDL159W

STE7

Description

New prediction

CDC42 is a small G protein of Rho superfamily. STE7 is a kinase downstream of CDC42 and interacts with STE5. It is possible that these interact, maybe as part of the Ste5 complex.

9

YLR229C

CDC42

YPL049C

DIG1

Description

New prediction

See line 8, DIG1 is further downstream.

10

YAL041W

CDC24

YHL007C

STE20

Description

Correct

Cdc24 is the GDP-GTP exchange factor for Cdc42. Cdc42 interacts with Ste20. These proteins can form a ternary complex.

11

YAL041W

CDC24

YLR362W

STE11

Description

New prediction

Ste11 is part of STE5 complex, and CDC42 is predicted to interact with its direct partners. See line 8 above.

12

YAL041W

CDC24

YDR103W

STE5

Description

New prediction

Directly supports line 8.

13

YAL041W

CDC24

YDL159W

STE7

Description

New prediction

See lines 8-12 above.

14

YBR200W

BEM1

YNL271C

BNI1

Description

Correct

BEM1 is SH3-domain protein that binds CDC42/CDC24 and BNI1 binds CDC42.

15

YBR200W

BEM1

YCL027W

FUS1

Description

New prediction

FUS1 is a cell surface protein that mediates cell fusion. It is one of the most downstream factors in the pathway.

16

YHL007C

STE20

YDL159W

STE7

Description

New prediction

See lines 8-13.

17

YNL271C

BNI1

YCL027W

FUS1

Description

New prediction

See line 15, above. Consistent with line 14: BEM1&BNI1 interact, so if line 15 is correct, then line 17 is correct also.

18

YPL049C

DIG1

YDR480W

DIG2

Description

Correct

see e.g. Kusari et al (2004) J. Cell Biol. 164, 267-277.

19

YDR461W

MFA1

YHR005C

GPA1

Description

Correct

Same argument as line 3.

 

*MATa = MFA1 (precursor) and MFA2 (lipopeptide hormone) è Ste3 is the receptor

MATalpha = MF(alpha)1 and MF(alpha)2 è Ste2 is the receptor

**STE5 is the anchor protein, described in the discussion of the paper

 

            So from the table 2 and table 3, we got to know that:

 

¡     For the 19 newly predicted pairs we got

l                  2 of them found wrongly predicted

l                  6 of them further verified correct predicted by Judith

l                  The remaining 11 new predictions are reasonable and would functionally make sense. We are currently trying to validate the above new predictions experimentally.

 

¡     So for correctness, among the 44 interacting protein pairs we predicted

l      31 out of 44 are correct (70.45%)

l      2 out of 44 are incorrect (4.55%)

l      11 out of 44 form new hypothesis that need further lab experimental verification (25%)

 

¡     So for completeness,

l      We do not know the complete interact pair information

l      We missed 25 pairs (3 + 18 + 4)

l      If assume 42+25 is the total true pair  (62.7% recall)

l      If assume 31+25 is the total true pair ( 55.4% recall)

 

 

Online service for all possible yeast protein pairs

 

      We have used our method on all possible yeast protein pairs to predict if they interact or not. For 6270 yeast proteins, there are 19653315 possible protein pairs overall. Use a default threshold (-0.2), we predict 26301 pairs of them to interact.

     

      The web service would is just published online at: http://www.medstory.net:8081/ppiws/ .  People can either search a certain yeast protein pair’ interacting confidence or search a protein’s interacting partners by setting some confidence threshold.