Carnegie Mellon-led Research Team Transforms DNA Microarray Analysis With Ideas from A Standard internet Communications Protocol

Byron SpiceWednesday, December 7, 2005

PITTSBURGH-A standard Internet protocol that checks errors made duringemail transmissions has now inspired a revolutionary method to transformDNA microarray analysis, a common technology used to understand geneactivation. The new method, which blends experiment and computation,strengthens DNA microarray analysis, according to its Carnegie MellonUniversity inventor, who is publishing his findings in the December issueof Nature Biotechnology with collaborators at the Hebrew University in Israel.

The innovative method combines a new experimental procedure and a newalgorithm to identify gene activation captured by DNA microarray analysiswith greater sensitivity and specificity. The work holds great promise forvastly improving research on health and disease, according to ZivBar-Joseph, assistant professor of computer science and biological sciences at Carnegie Mellon.

"We are very excited about introducing this versatile, powerful method tothe research community because it can be used to study a wide range ofcomplex, dynamic systems more comprehensively,"said Bar-Joseph, who also is a member of the Computer Science Department and Center for Automated Learning and Discovery at the School of Computer Science. "Such systems under study include stress anddrug response, cancer and embryo development."

DNA microarray analysis - a multimillion-dollar-a-year industry -identifies gene activation in living, complex biological systems. DNAmicroarrays monitor the behavior of thousands of genes over time bydetecting changes in the expression of as many as 30,000 different genes on one small chip. The technique has been used to study some of the mostimportant biological systems, including how cells normally divide (the cell cycle) and immune responses to disease and infection.

"Ultimately, we think that the addition of this method to standard DNA microarray analysis will make it more accurate and cost-effective,"Bar-Joseph added.

"While DNA microarrays are very powerful, they present a sampling problem," Bar-Joseph said."DNA microarrays only take static snapshots ofgene activity over time. In between these snapshots, genes could be activated and we just don't see them turning on. Our protocol will offergreater overall sensitivity in detecting the expression of any gene, even if a gene turns on when no microarray sampling takes place."

Bar-Joseph's procedure is based on a "check-sum" protocol initially developed to ensure that email messages sent via the Internet don't becomegarbled in transmission. In the standard Internet check-sum protocol, bitsof information that begin as one value (0 or 1) may inadvertently flip tothe opposite value as they move from one computer to the next in the formof an email. This data loss, ascribed to noise in the communicationchannel, is checked by counting the number of 1's in the message. If thisnumber is odd, then the last bit is set to 1; otherwise it is set to 0. Bycomparing the number of 1's on the sending end with the value of the lastbit on the receiving end, the recipient's computer can determine whetherthe message was accurately received. If not, the recipient's computer asksthe sender's computer to forward the message again. Bar-Joseph's methodcarries out a similar analysis of the microarray snapshots by "checking" the sum of a set of DNA microarray data points over time (a time seriesexperiment) against the "summary" of the temporal response. If the two sets of results are equal, then what is captured by the DNA microarray timeseries is real. If the time series results produce a lower value than the microarray summary, the protocol indicates that the researchers have missed a gene's activation somewhere in their time series.

Just as important, according to Bar-Joseph, is whether a DNA microarraysummary value exceeds its time sequence value. If that's the case, thenresearchers have likely identified gene activity that should be attributedto changes taking place during an experiment - adding a chemical orchanging the temperature, for instance. This aspect of the method providesscientists with the specificity they need to weed out such introduced geneactivation from fundamental gene activation pathways that form the hallmark of processes like cancer or immunity.

To prove the effectiveness of this new method, Bar-Joseph studied the human cell division cycle. Considered one of the most important biologicalsystems, the cell cycle plays a major role in cancer. Using their newmethod, Bar-Joseph and his colleagues identified many new human genes thatwere not previously found to be participants in this system.

"This new set of gene discoveries opens theway to new and more accurate models of the cell cycle system, which in turn can lead to new targets for cancer drugs," said Bar-Joseph.

The new method also overcomes synchronization loss, a vexing problem for scientists who study hundreds or thousands of cells over time, according to Bar-Joseph. Large groups of living cells that start out together at thesame biological point in time eventually become asynchronized in their activities, he noted."You can compare a group of cells starting out in an experiment like a group of marathoners at the starting line. Over time, some marathoners willbe far ahead on the track, while others will fall back." After the race begins, finding one marathoner among the thousands is difficult. Similarly, with asynchronous cells, trying to sort out a single cell response is virtually impossible. But Bar-Joseph has incorporated mathematical tools in his method that can detect genes affected by such asynchrony in a population of cells.

The work is supported in part by a National Science Foundation CAREER award, the Pittsburgh Life Sciences Greenhouse and the PennsylvaniaDepartment of Health through the Tobacco Settlement Fund.

For More Information

Byron Spice | 412-268-9068 | bspice@cs.cmu.edu