è
[new] Since the related positive reference sets
and feature sources have updated rapidly over the years, just sharing the
extracted feature files or partial prediction scores are not good enough
anymore. Thus, Here I share the
code and related files to generate our feature set. Download (both summary and detailed !) The general framework and the codes should be quite useful. You could try to find more recent versions of related evidence sets to make improvement though. |
>
Feature Details in the data set
Group Index |
# of features |
Dataset |
Attribute
Property |
Data Position in the set |
1 |
20 |
Gene
Expression |
Real
value: [-1,
1] |
1-20 |
2 |
21 |
GO
Molecular Function |
|
21-
41 |
3 |
33 |
GO
Biological Process |
|
42 -
74 |
4 |
23 |
GO
Component |
|
75 -
97 |
5 |
1 |
Protein
Expression |
Real
Value – Non Negative |
98 |
6 |
1 |
Essentiality |
|
99 |
7 |
1 |
HMS_PCI
Mass * |
|
100 |
8 |
1 |
TAP
Mass *
|
|
101 |
9 |
1 |
Y2H |
|
102 |
10 |
1 |
Synthetic
Lethal |
|
103 |
11 |
1 |
Gene
Neighborhood / Gene Fusion / Gene Co-occur |
|
104 |
12 |
1 |
Sequence
Similarity |
Real
value - Non negative |
105 |
13 |
4 |
Homology
based PPI |
Discrete:
Non-negative (Most 0, 1) |
106
– 109 |
14 |
1 |
Domain-Domain
Interaction |
Real
value between [0, 1] |
110 |
15 |
16 |
Protein-DNA
TF group binding |
Non-negative
discrete, most 0 |
111
– 126 |
16 |
25 |
MIPS
Protein Class |
|
127
– 151 |
17 |
11 |
MIPS
Mutant Phenotype |
|
151
- 162 |
* Matrix model for co-complex and co-pathway
prediction. Spoke model for direct PPI prediction.
>
Shared data sets
>
Note
·
“-100”
in the feature sets means a missing value in that position!
·
Details
about the gold standard positive sets shared above, please check “Gold Standard
datasets” section in the paper.
·
The
negative data sets I put here is just a random subset containing ~230,000 yeast
protein-protein pairs that are not in the positive PPI set of each specific
task.
·
In
the paper, we assume the size ratio between the positive examples and the
negative examples is roughly 1:600 (estimated based on experimental data) in
building the train-test sets.
·
This
ratio is still questionable and need further discussion.
·
If
you happen to know a better answer other than the above strategy I used, it
would be greatly appreciated if you could contact
me.
·
·
If
you notice any mistakes in the data, please contact
me as soon as possible. Thanks ahead !
·
FAQ page