New annotated database sifts through mountains of sequencing data to find gene promoters

Dec 22, 2010

Researchers at The Wistar Institute announce the release of an online tool that will help scientists find "gene promoters"—regions along a DNA strand that tell a cell's transcription machinery where to start reading in order to create a particular protein. The Mammalian Promoter Database (MPromDb) integrates the genome sequencing data generated at Wistar with publicly available data on human and mouse genomics. MPromDb pinpoints known promoters and predicts where new ones are likely to be found, the researchers say.

"Several complete genome sequences are available, including highly accurate assembled sequences from more than 1,000 individuals from the '1000 Genome Project,' with the goal of providing a comprehensive resource on human genetic variation and guiding us into the personal genomics era," said Ramana V. Davuluri, Ph.D., associate professor in Wistar's Molecular and Cellular Oncogenesis Program and associate director of The Wistar Institute Center for Systems and Computational Biology. "With this information, researchers can design personalized diagnostics and therapeutics or delve deeper into the study of gene regulation than previously thought possible."

Davuluri and his colleagues published details of how they built MPromDb in the journal Nucleic Acids Research, available online now.

Contrary to what was once the textbook view of genetics, one gene may not encode just one protein. In fact, scientists now know that a single gene may encode multiple versions of a given protein—called a protein's isoforms—which allows cells to make almost 100,000 distinct proteins even though our DNA only encodes about 20,000 protein-coding . As the body grows in the womb, cells may use different isoforms at different stages of development. Likewise, different adult cells may also use different isoforms of a protein depending on what type of cell it is, such as a neuron versus a skin cell.

"We have evolved this beautiful system where our DNA creates tremendous diversity from a limited set of genetic instructions," Davuluri said. "Recent evidence shows that at least half of all of our genes have alternative promoters that allow cells to make transcript variants and protein isoforms."

Earlier reports from the Davuluri laboratory showed that nearly 40 percent of genes use alternative promoters to create protein isoforms. According to Davuluri, integrating this information with data from other studies would surely find significantly more of these alternative promoters.

"Much of the genetic variations occur outside protein coding regions, such as gene regulatory regions," Davuluri said. "MPromDb provides context for data in the form of gene promoter annotations that can tell you where and when our bodies make a particular protein variant."

MPromDb mines its information from huge databases maintained by national and international consortiums of researchers, such as Gene Expression Omnibus (GEO) maintained by the National Center for Biotechnology Information and ENCyclopedia of DNA Elements (ENCODE) run by the National Human Genome Research Institute. Essentially, MPromDb looks for key DNA sequences that could be potential binding sites for Polymerase II, an enzyme that creates the RNA transcript that the cell later translates into . The current database contains information on over 42,000 human promoters found in six different cell types and over 48,000 mouse gene promoters found in 10 different cell types.

"In fact, scientists are so good at generating this sort of information using next generation sequencing methods, that they collect information far in excess of what they might need for a given experiment or project," Davuluri said. "This information all ends up in places like GEO, waiting to be discovered by groups like ours."

According to Davuluri, the Wistar Center for Systems and Computational Biology plans to expand MPromDb to include epigenetic data—information on modifications to DNA that affect gene regulation; protein-DNA interaction data; and genetic variation data for both humans and mice.

Explore further: Researchers identify new circadian clock component

Provided by The Wistar Institute

not rated yet
add to favorites email to friend print save as pdf

Related Stories

Rewrite the textbooks: Transcription is bidirectional

Jan 25, 2009

Genes that contain instructions for making proteins make up less than 2% of the human genome. Yet, for unknown reasons, most of our genome is transcribed into RNA. The same is true for many other organisms that are easier ...

Learning the language of DNA

May 02, 2006

An international consortium of scientists, including a team from The University of Queensland's Institute for Molecular Bioscience (IMB), is a step closer to the next generation of treatments to combat disease, after publishing ...

The importance of gene regulation for common human disease

Sep 16, 2007

A new study published in Nature Genetics on Sunday 16 September 2007 show that common, complex diseases are more likely to be due to genetic variation in regions that control activity of genes, rather than in the regions that s ...

Recommended for you

Massive study closes in on cancers risk markers

May 15, 2013

Cancer research has taken a huge leap forward with scientists now able to identify more than 80 genetic markers found to increase the risk of breast, ovarian and prostate cancer. The COGS international research ...

User comments : 0

More news stories

Heat-related deaths in Manhattan projected to rise

Residents of Manhattan will not just sweat harder from rising temperatures in the future, says a new study; many may die. Researchers say deaths linked to warming climate may rise some 20 percent by the 2020s, ...

Honeybees trained in Croatia to find land mines

(AP)—Mirjana Filipovic is still haunted by the land mine blast that killed her boyfriend and blew off her left leg while on a fishing trip nearly a decade ago. It happened in a field that was supposedly ...

Mice, gerbils perish in Russia space flight

A number of mice and eight gerbils sent into space in a Russian capsule destined to find out how well organisms can withstand extended flights perished during their journey, scientists said Sunday as the ...