Simulated Data
Two simulated data sets were generated to contain each 5000
genes and
five time points. In the first data set every expression value was
independently drawn from a Uniform[10,100] distribution.
In the second data set the expression values for
4850 genes were again
drawn from a Uniform[10,100] distribution. For the remaining 150 genes,
the expression profile for the gene
(v0,v1,v2,v3,v4)
was drawn from the following distribution:
v0 ~ Uniform[10,100]
vi ~ vi-1*(2mi
- mi-1) + Z for i=1,...,4
where Z is distributed as a
Uniform[-.05*2mi-mi-1,.05*2mi-mi-1]
For 50 genes (1% of the total) we set
(m0,m1,m2,m3,m4)=(0,-1,1,1,-1)
For another 50 genes we set
(m0,m1,m2,m3,m4)=(0,1,0,0,2)
For the final 50 genes we set
(m0,m1,m2,m3,m4)=(0,2,1,0,0)
In effect we have planted 50 genes in each of the three profiles with some
noise.
The algorithm correctly determine that there was no significant
profiles in the first data set.
In the second data set
the algorithm was correctly able to recover the three planted expression
profiles with no false positives.
Simulated data with no genes
planted
Simulated data with genes
planted
The model profiles in the experiment with no genes planted. All profiles
are uncolored
meaning there is no significant profile.
The model profiles in the experiment with genes planted.
The three colored profiles are those for which genes were planted.