- Ding J, Lin C, Bar-Joseph Z. Cell lineage inference from SNP and scRNA-Seq data. Accepted, Nucleic Acids Research, 2019
Abstract:
Several recent studies focus on the inference of developmental and response trajectories from single cell RNA-Seq (scRNA-Seq) data.
A number of computational methods, often referred to as pseudo-time ordering, have been developed for this task.
Recently, CRISPR has also been used to reconstruct lineage trees by inserting random mutations.
However, both approaches suffer from drawbacks that limit their use. Here we develop a method to detect significant, cell type specific, sequence mutations from scRNA-Seq data. We show that only a few mutations are enough for reconstructing good branching models. Integrating these mutations with expression data further improves the accuracy of the reconstructed models. As we show, the majority of mutations we identify are likely RNA editing events indicating that such information can be used to distinguish cell types.
link
- Friedman C, Nguyen Q, Lukowski S, ..., Ding J, Wang Y, Hudson J, Ruohola-Baker H, Bar-Joseph Z, Tam P, Powell J, Palpant N.
Single-Cell Transcriptomic Analysis of Cardiac Differentiation from Human PSCs Reveals HOPX-Dependent Cardiomyocyte Maturation. Cell Stem Cell, 23(4):586-598, 2018
Abstract:
Cardiac differentiation of human pluripotent stem cells (hPSCs) requires orchestration of dynamic gene regulatory
networks during stepwise fate transitions but often generates immature cell types that do not fully recapitulate
properties of their adult counterparts, suggesting incomplete activation of key transcriptional networks.
We performed extensive single-cell transcriptomic analyses to map fate choices and gene expression programs during
cardiac differentiation of hPSCs and identified strategies to improve in vitro cardiomyocyte differentiation.
Utilizing genetic gain- and loss-of-function approaches, we found that hypertrophic signaling is not effectively
activated during monolayer-based cardiac differentiation, thereby preventing expression of HOPX and its activation of
downstream genes that govern late stages of cardiomyocyte maturation. This study therefore provides a key
transcriptional roadmap of in vitro cardiac differentiation at single-cell resolution, revealing fundamental mechanisms underlying
heart development and differentiation of hPSC-derived cardiomyocytes.
link
- Nguyen QH, Lukowski SW, Chiu HS, Friedman CE, Senabouth A, Crowhurst L, Bruxner TJ, Christ AN, Hudson J, Ding J, Bar-Joseph Z. Genetic networks modulating cell fate specification and contributing to cardiac disease risk in hiPSC-derived cardiomyocytes at single cell resolution. Human Genomics. 2018 Mar 9;12.
Abstract: With huge amount of genome-wide mutational data generated by
cancer genomic sequencing studies, distinguishing cancer drivers
from the vast majority of passengers is important. Existing cancer
driver prediction methods capture specific mutational aspects in discriminating
potential cancer drivers. We explore the possibility of alterative
way in doing the task.
We noted mutational parameters (functional mutation ratio, mutation
frequency and sample mutation recurrence) vary differently among
mutant genes of different sizes. This led us to develop our novel algorithm
(Mutant Gene Ranker - MuGeR), incorporating the comparison
of multiple mutational parameters of target gene against the
corresponding background derived from a specific subset of genes
using a sliding window approach, to estimate the likelihood of target
genes for being potential cancer drivers. We applied our MuGeR algorithm
on the The Cancer Genome Atlas (TCGA) datasets.
Empirical data on the TCGA datasets and comparison with the
prioritization results of 4 other existing tools (MuSiC, MuSig, TUSON
explorer and DOTS-Finder) suggested satisfactory performance of our
MuGeR algorithm. More importantly, we demonstrated the existence
of specific pattern for mutational parameters across cancers.
Empirical data verified the usefulness of our MuGeR algorithm in
identifying potential cancer drivers. Moreover, our in-depth appraisal
of TCGA liver hepatocellular carcinoma datasets further highlighted
the frequent mutational dysregulation of ubiquitin-related proteasomal
degradation in driving hepatocarcinogenesis.
link
- Ding J, Aronow B, Kaminski N, Kitzmiller J, Whitsett J, Bar-Joseph Z. Reconstructing differentiation networks and their regulation from time series single cell expression data. Genome research. 2018 Jan 9:gr-225979.
Abstract: Generating detailed and accurate organogenesis models using single cell RNA-seq data remains a major challenge.
Current methods have relied primarily on the assumption that decedent cells are similar to their parents in terms of gene expression levels.
These assumptions do not always hold for in-vivo studies which often include infrequently sampled, un-synchronized and diverse cell populations.
Thus, additional information may be needed to determine the correct ordering and branching of progenitor cells and the set of transcription factors (TFs)
that are active during advancing stages of organogenesis.
To enable such modeling we have developed a method that learns a probabilistic model which integrates expression similarity with
regulatory information to reconstruct the dynamic developmental cell trajectories. When applied to mouse lung developmental data the method
accurately distinguished different cell types and lineages. Existing and new experimental data validated the ability of the method to identify key regulators of cell fate.
link
- Ding J, Hagood JS, Ambalavanan N, Kaminski N, Bar-Joseph Z. iDREM: Interactive visualization of dynamic regulatory networks. PLoS computational biology. 2018 Mar 14;14(3):e1006019.
Abstract: The Dynamic Regulatory Events Miner (DREM) software reconstructs dynamic regulatory networks
by integrating static protein-DNA interaction data with time series gene expression data.
In recent years, several additional types of high-throughput time series data have been profiled when studying biological processes
including time series miRNA expression, proteomics, epigenomics and single cell RNA-Seq.
Combining all available time series and static datasets in a unified model remains an important challenge and goal.
To address this challenge we have developed a new version of DREM termed interactive DREM (iDREM).
iDREM provides support for all data types mentioned above and combines them with existing interaction data to
reconstruct networks that can lead to novel hypotheses on the function and timing of regulators.
Users can interactively visualize and query the resulting model. We showcase the functionality of the new tool by applying it to microglia developmental data from multiple labs.
link
- Ding J, Bar-Joseph Z. MethRaFo: MeDIP-seq methylation estimate using a Random Forest Regressor. Bioinformatics. 2017 Jul 13;33(21):3477-9.
Abstract:
Profiling of genome wide DNA methylation is now routinely performed when studying development, cancer and several other biological processes.
Although Whole genome Bisulfite Sequencing provides high-quality methylation measurements at the resolution of nucleotides,
it is relatively costly and so several studies have used alternative methods for such profiling. One of the most widely used low cost alternatives is MeDIP-Seq.
However, MeDIP-Seq is biased for CpG enriched regions and thus its results need to be corrected in order to determine accurate methylation levels.
Here we present a method for correcting MeDIP-Seq results based on Random Forest regression.
Applying the method to real data from several different tissues (brain, cortex, penis) we show that it achieves almost 4 fold decrease in run time while
increasing accuracy by as much as 20% over prior methods developed for this task.
link
- Ding J, Li X, Hu H. CCmiR: a computational approach for competitive and cooperative microRNA binding prediction. Bioinformatics. 2017 Sep 25;34(2):198-206.
Abstract:
The identification of microRNA (miRNA) target sites is important. In the past decade, dozens of computational methods have been developed to predict miRNA target sites.
Despite their existence, rarely does a method consider the well-known competition and cooperation among miRNAs when attempts to discover target sites.
To fill this gap, we developed a new approach called CCmiR, which takes the cooperation and competition of multiple miRNAs into account in a statistical model to predict their target sites.
Tested on four different datasets, CCmiR predicted miRNA target sites with a high recall and a reasonable precision, and identified known and new cooperative and competitive miRNAs supported by literature.
Compared with three state-of-the-art computational methods, CCmiR had a higher recall and a higher precision.
link
- Roqueta-Rivera M, Esquejo RM, Phelan PE, Sandor K, Daniel B, Foufelle F, Ding J, Li X, Khorasanizadeh S, Osborne TF. SETDB2 links glucocorticoid to lipid metabolism through Insig2a regulation. Cell metabolism. 2016 Sep 13;24(3):474-84.
Abstract:
Transcriptional and chromatin regulations mediate the liver response to nutrient availability.
The role of chromatin factors involved in hormonal regulation in response to fasting is not fully understood.
We have identified SETDB2, a glucocorticoid-induced putative epigenetic modifier, as a positive regulator of GR-mediated gene activation in liver.
Insig2a increases during fasting to limit lipid synthesis, but the mechanism of induction is unknown. We show Insig2a induction is GR-SETDB2 dependent.
SETDB2 facilitates GR chromatin enrichment and is key to glucocorticoid-dependent enhancer-promoter interactions.
INSIG2 is a negative regulator of SREBP, and acute glucocorticoid treatment decreased active SREBP during refeeding or in livers of Ob/Ob mice,
both systems of elevated SREBP-1c-driven lipogenesis. Knockdown of SETDB2 or INSIG2 reversed the inhibition of SREBP processing.
Overall, these studies identify a GR-SETDB2 regulatory axis of hepatic transcriptional reprogramming and identify SETDB2 as a potential target for metabolic disorders with aberrant glucocorticoid actions
link
- Ding J, Li X, Hu H. TarPmiR: a new approach for microRNA target site prediction. Bioinformatics. 2016 May 20;32(18):2768-75.
Abstract:
The identification of microRNA (miRNA) target sites is fundamentally important for studying gene regulation.
There are dozens of computational methods available for miRNA target site prediction.
Despite their existence, we still cannot reliably identify miRNA target sites, partially due to our limited understanding of the characteristics of miRNA target sites.
The recently published CLASH (crosslinking ligation and sequencing of hybrids) data provide an unprecedented opportunity to study the characteristics of miRNA target sites and improve miRNA target site prediction methods.
Applying four different machine learning approaches to the CLASH data, we identified seven new features of miRNA target sites.
Combining these new features with those commonly used by existing miRNA target prediction algorithms,
we developed an approach called TarPmiR for miRNA target site prediction. Testing on two human and one mouse non-CLASH datasets,
we showed that TarPmiR predicted more than 74.2% of true miRNA target sites in each dataset. Compared with three existing approaches,
we demonstrated that TarPmiR is superior to these existing approaches in terms of better recall and better precision.
link
- Ding J, Li X, Hu H. MicroRNA modules prefer to bind weak and unconventional target sites. Bioinformatics .2014; doi: 10.1093/bioinformatics/btu833.
Abstract:
MicroRNAs (miRNAs) play critical roles in gene regulation. Although it is well known that multiple miRNAs may work as miRNA modules to synergistically regulate common target mRNAs,
the understanding of miRNA modules is still in its infancy. We employed the recently generated high throughput experimental data to study miRNA modules.
We predicted 181 miRNA modules and 306 potential miRNA modules. We observed that the target sites of these predicted modules were in general weaker compared with those not bound by miRNA modules.
We also discovered that miRNAs in predicted modules preferred to bind unconventional target sites rather than canonical sites.
Surprisingly, contrary to a previous study, we found that most adjacent miRNA target sites from the same miRNA modules were not within the range of 10-130 nucleotides.
Interestingly, the distance of target sites bound by miRNAs in the same modules was shorter when miRNA modules bound unconventional instead of canonical sites.
Our study shed new light on miRNA binding and miRNA target sites, which will likely advance our understanding of miRNA regulation.
link
- Ding J, Dhillon V, Li X, Hu H. Systematic discovery of cofactor motifs from ChIP-seq data by SIOMICS. Methods . 2014; doi:10.1016/j.ymeth.2014.08.006
Abstract:
Understanding transcriptional regulatory elements and particularly the transcription factor binding sites represents a significant challenge in computational biology.
The chromatin immunoprecipitation followed by massive parallel sequencing (ChIP-seq) experiments provide an unprecedented opportunity to study transcription factor binding sites on the genome-wide scale.
Here we describe a recently developed tool, SIOMICS, to systematically discover motifs and binding sites of transcription factors and their cofactors from ChIP-seq data. Unlike other tools,
SIOMICS explores the co-binding properties of multiple transcription factors in short regions to predict motifs and binding sites.
We have previously shown that the original SIOMICS method predicts motifs and binding sites of more cofactors in more accurate and time-effective ways than two popular methods.
In this paper, we present the extended SIOMICS method, SIOMICS_Extension, and demonstrate its usage for systematic discovery of cofactor motifs and binding sites.
The SIOMICS tool, including SIOMICS and SIOMICS_Extension, are available at http://hulab.ucf.edu/research/projects/SIOMICS/SIOMICS.html.
link
- Ding J, Hu H, Li X. SIOMICS: a Novel Approach for Systematic Identification of Motifs in ChIP-seq Data . Nucleic Acids Research . 2014; 42(5): e35.
Abstract:
The identification of transcription factor binding motifs is important for the study of gene transcriptional regulation.
The chromatin immunoprecipitation (ChIP), followed by massive parallel sequencing (ChIP-seq) experiments, provides an unprecedented opportunity to discover binding motifs.
Computational methods have been developed to identify motifs from ChIP-seq data, while at the same time encountering several problems.
For example, existing methods are often not scalable to the large number of sequences obtained from ChIP-seq peak regions.
Some methods heavily rely on well-annotated motifs even though the number of known motifs is limited.
To simplify the problem, de novo motif discovery methods often neglect underrepresented motifs in ChIP-seq peak regions.
To address these issues, we developed a novel approach called SIOMICS to de novo discover motifs from ChIP-seq data.
Tested on 13 ChIP-seq data sets, SIOMICS identified motifs of many known and new cofactors. Tested on 13 simulated random data sets,
SIOMICS discovered no motif in any data set. Compared with two recently developed methods for motif discovery, SIOMICS shows advantages in terms of speed,
the number of known cofactor motifs predicted in experimental data sets and the number of false motifs predicted in random data sets.
The SIOMICS software is freely available at http://eecs.ucf.edu/∼xiaoman/SIOMICS/SIOMICS.html.
link
- Ding J, Hu H, Li X. NIM, A novel computational method for predicting nuclear-encoded chloroplast proteins. Journal of Medical and Bioengineering. 2013; 2(2): 115-119.
Abstract:
The identification of nuclear-encoded chloroplast
proteins is important for the understanding of their
functions and their interaction in chloroplasts. Despite
various endeavors in predicting these proteins, there is still
room for developing novel computational methods for
further improving the prediction accuracy. Here we
developed a novel computational method called NIM based
on interpolated Markov chains to predict nuclear-encoded
chloroplast proteins. By testing the method on real data, we
show NIM has an average sensitivity larger than 92% and
an average specificity larger than 97%. Compared with the
state-of-the-art methods, we demonstrate that NIM
performs better or is at least comparable with them. Our
study thus provides a novel and useful tool for the
prediction of nuclear-encoded chloroplast proteins.
link
- Ding J, Cai X, Wang Y, Hu H, Li X. ChIPModule: Systematic discovery of transcription factors and their cofactors from ChIP-seq data. Pac Symp Biocomput. 2013.
Abstract:
We have developed a novel approach called ChIPModule to systematically discover transcription factors and their cofactors from ChIP-seq data.
Given a ChIP-seq dataset and the binding patterns of a large number of transcription factors,
ChIPModule can efficiently identify groups of transcription factors,
whose binding sites significantly co-occur in the ChIP-seq peak regions.
By testing ChIPModule on simulated data and experimental data, we have shown that ChIPModule identifies known cofactors of transcription factors,
and predicts new cofactors that are supported by literature. ChIPModule provides a useful tool for studying gene transcriptional regulation.
link
- Ding J, Li X, Hu H. Systematic discovery of cis-regulatory elements in Chlamydomonas reinhardtii genome using comparative genomics . Plant Physiology. 2012;160(2):613-23.
Abstract:
Chlamydomonas reinhardtii (C. reinhardtii) is one of the most important microalgae model organisms and has been widely
studied towards the understanding of chloroplast functions and various cellular processes.
Further exploitation of C. reinhardtii as a model system to elucidate various molecular mechanisms and pathways requires systematic study of gene regulation.
However, there is a general lack of genome-scale gene regulation study such as global cis-regulatory element (CRE) identification in C. reinhardtii.
Recently, large-scale genomic data in microalgae species have become available, which enable the development of efficient computational methods to systematically identify CREs and characterize
their roles in microalgae gene regulation. Here we performed in-silico CRE identification at the whole genome level in C. reinhardtii using a comparative-genomics-based method.
We identified a large number of CREs in C. reinhardtii that are consistent with experimentally verified CREs.
We also discovered that a large percentage of these CREs form combinations and have the potential to work together for coordinated gene regulation in C. reinhardtii.
Multiple evidences from literature, gene transcriptional profiles and gene annotation resources support our discovery.
The discovered CREs will serve as the first large-scale collection of CREs in C. reinhardtii to facilitate further experimental study of microalgae gene regulation.
The accompanying software tool and the predictions in C. reinhardtii are also made available through a web-accessible database (http://hulab.ucf.edu/research/projects/Microalgae/sdcre/motifcomb.html).
link
- Ying Wang, Ding J, Daniell H, Hu H, Li X. Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins. Plant Molecular Biology . 2012;80(2):177-87.
Abstract:
Chloroplasts play critical roles in land plant cells.
Despite their importance and the availability of at least 200 sequenced chloroplast genomes,
the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper,
we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in
seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional
regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes,
indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic
sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses,
protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs.
Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein,
which sheds light on the understanding of the transcriptional regulation of chloroplast genes.
link
- Ding J, Hu H, Li X. Thousands of cis-regulatory sequences are shared by Arabidopsis and populus. Plant Physiology. 2012;158(1):145-55. Epub 2011 Nov 4.
Abstract:
The identification of cis-regulatory modules can greatly advance our understanding of gene regulatory mechanisms.
Despite the existence of binding sites of more than three transcription factors in a cis-regulatory module,
studies in plants often consider only the co-occurrence of binding sites of one or two transcription factors.
In addition, cis-regulatory module studies in plants are limited to combinations of only a few families of transcription factors.
It is thus not clear how widespread plant transcription factors work together, which transcription factors work together to regulate plant genes,
and how the combinations of these transcription factors are shared by different plants. To fill these gaps,
we applied a frequent pattern mining based approach to identify frequently used cis-regulatory sequence combinations in the promoter sequences of two plant species,
Arabidopsis thaliana and Populus trichocarpa. A cis-regulatory sequence here corresponds to a DNA motif bound by a transcription factor.
We identified 18638 combinations composed of 2 to 6 cis-regulatory sequences that are shared by the two plant species.
In addition, with known cis-regulatory sequence combinations, gene function annotation, gene expression data, and known functional gene sets,
we shown that the functionality of at least 96.8% and 65.2% of these shared combinations in Arabidopsis are partially supported, under a false discovery rate of 0.1 and 0.05, respectively.
Finally, we discovered that 796 of the 18638 combinations might relate to functions that are important in bioenergy research. Our work will facilitate the study of gene transcriptional regulation in plants.
link
- Ding J, Liu Falin. Novel Tag Anti-Collision Algorithm with Adaptive Grouping. Wireless Sensor Network, 2009 1, 475-481
Abstract:
For RFID tags, a Novel Tag Anti-collision Algorithm with Grouping (TAAG) is proposed.
It divides tags into groups and adopts a deterministic method to identify tags within group.
TAAG estimates the total number of tags in systems from group identifying result and then adjusts the grouping method accordingly.
The performance of the proposed TAAG algorithm is compared with the conventional tag anti-collision algorithms by simulation experiments.
According to both the analysis and simulation result, the proposed algorithm shows better performance in terms of throughput, total slots used to identify and total cycles.
link