During the ENCODE study, researchers found that more than 80 percent of the human genome sequence is linked to biological function. They also mapped more than 4 million regulatory regions where proteins specifically interact with the DNA with exquisite specificity. These findings are a significant advance in understanding the precise and complex controls over the expression of genetic information within a cell.
"Penn State's contribution to the ENCODE project involves using the new ENCODE data to help explain how genetic variants that do not affect the structure of encoded proteins could affect a person's susceptibility to disease," said Ross Hardison, the T. Ming Chu Professor of Biochemistry and Molecular Biology at Penn State and a member of the ENCODE research team. The research led by Hardison is highlighted in the main integrative ENCODE paper to be published in the journal Nature.
"Genome-wide association studies can map with high resolution the places on our genomes where variation in the DNA sequence among individual persons affects their likelihood of having diabetes, cardiac disease, any of a large number of autoimmune diseases such as Crohn's disease, and other common diseases," Hardison said. Because most of these genetic variations are not in regions of the DNA that contain the codes for producing proteins, scientists suspected that some of these non-coding regions might have an important role in controlling the expression of genes.
Hardison's team at Penn State worked with others in the ENCODE Consortium to show, on a genome-wide scale, that many of the DNA regions that do not hold codes for proteins do, indeed, have an important role in controlling which genes are turned on and which are turned off. "Moreover, our research has made it possible to generate specific molecular hypotheses for how genetic variants in these DNA regions that control gene expression could affect the susceptibility to disease," Hardison said. "We demonstrate this process using, as an example, a locus associated with Crohn's and a few other autoimmune diseases. It is exciting to see our basic research revealing insights that help the progress of medical science, potentially facilitating a more personalized approach to medical practice."
In addition to Hardison, other Penn State scientists whose work on the ENCODE project is featured among the papers to be published on 6 September include Programmer/Analyst Belinda Giardine, Postdoctoral Scholars Robert S. Harris and Weisheng Wu, and Professor of Biology and of Computer Science and Engineering Webb Miller.
The overall ENCODE findings bring into much sharper focus the continually active genome in which proteins routinely turn genes on and off using sites that are sometimes at great distances from the genes they regulate; where sites on a chromosome interact with each other, also sometimes at great distances; where chemical modifications of DNA influence gene expression; and where various functional forms of RNA, a form of nucleic acid related to DNA, help regulate the whole system. "The ENCODE catalog is like Google Maps for the human genome," said Elise Feingold, a program director at the National Institutes of Health National Human Genome Research Institute (NHGRI), who helped to start the ENCODE Project. "The ENCODE maps allow researchers to inspect the chromosomes, genes, functional elements and individual nucleotides in the human genome in much the same way."
"During the early debates about the Human Genome Project, researchers had calculated that only a few percent of the sequence encoded proteins, the workhorses of the cell," said Eric D. Green, director of NHGRI. "Early on, some scientists even argued that most of the genome was 'junk.' ENCODE now gives us much more appreciation of the complex molecular ballet that converts genetic information into living cells and organisms, and we can now say that there is very little, if any, junk DNA."
Hundreds of researchers in the United States, United Kingdom, Spain, Singapore, and Japan performed more than 1,600 sets of experiments on 147 types of tissue with technologies standardized across the consortium. The experiments relied on innovative uses of new next-generation sequencing technologies enabled, in part, by NHGRI's technology initiative for DNA sequencing. In total, ENCODE generated more than 15-trillion bytes of raw data and its analysis consumed the equivalent of more than 300 years of computer time.
Provided by Pennsylvania State University
This Phys.org Science News Wire page contains a press release issued by an organization mentioned above and is provided to you “as is” with little or no review from Phys.Org staff.