Simulated Data

Two simulated data sets were generated to contain each 5000 genes and five time points. In the first data set every expression value was independently drawn from a Uniform[10,100] distribution. In the second data set the expression values for 4850 genes were again drawn from a Uniform[10,100] distribution. For the remaining 150 genes, the expression profile for the gene (v₀,v₁,v₂,v₃,v₄) was drawn from the following distribution:
v₀ ~ Uniform[10,100]
v_i ~ v_i-1*(2^m_i - m_i-1) + Z for i=1,...,4
where Z is distributed as a Uniform[-.05*2^m_i-m_i-1,.05*2^m_i-m_i-1]
For 50 genes (1% of the total) we set
(m₀,m₁,m₂,m₃,m₄)=(0,-1,1,1,-1)
For another 50 genes we set
(m₀,m₁,m₂,m₃,m₄)=(0,1,0,0,2)
For the final 50 genes we set
(m₀,m₁,m₂,m₃,m₄)=(0,2,1,0,0)
In effect we have planted 50 genes in each of the three profiles with some noise.
The algorithm correctly determine that there was no significant profiles in the first data set.
In the second data set the algorithm was correctly able to recover the three planted expression profiles with no false positives.

Simulated data with no genes planted

Simulated data with genes planted

The model profiles in the experiment with no genes planted. All profiles are uncolored meaning there is no significant profile.

The model profiles in the experiment with genes planted. The three colored profiles are those for which genes were planted.