Genomic data are growing, but what do we really know?

Mar 20, 2013

"We live in the post-genomic era, when DNA sequence data is growing exponentially", says Miami University (Ohio) computational biologist Iddo Friedberg. "But for most of the genes that we identify, we have no idea of their biological functions. They are like words in a foreign language, waiting to be deciphered." Understanding the function of genes is a problem that has emerged at the forefront of molecular biology. Many groups develop and employ sophisticated algorithms to decipher these "words". However, until now there was no comprehensive picture of how well these methods perform, "To use the information in our genes to our advantage, we first need to take stock of how well we are doing in interpreting these data".

To do so, Friedberg and his colleagues, Predrag Radivojac, of Indiana University, Bloomington IN and Sean Mooney, Buck Institute for Research on Aging, Novato CA organized the Critical Assessment of Annotation, or CAFA. CAFA is a community-wide experiment to assess the performance of the many methods used today to predict the functions of proteins, the workhorses of the cell coded by our genes.

Thirty research groups comprising 102 scientists and students participated in CAFA, presented a total of 54 methods. The participating groups came from leading universities in North America, Europe, Asia and Australia. The groups participated in blind-test experiments in which they predicted the function of for which the functions are already known but haven't yet been made publicly available. Independent assessors then judged their performance.

The results are published in this month's issue of Nature Methods co-authored by members of all the participating groups, with Friedberg and Radivojac as lead authors. Fifteen companion papers have been published in a special issue of BMC Bioinformatics detailing the methods.

"We have discovered a great enthusiasm and community spirit", said Friedberg, who since 2005 has been organizing Automated Function Prediction (AFP) meetings internationally. This, despite the competitive environment in which research groups want their methods to perform better than their peers' methods. Overall, throughout CAFA there was a highly collegial spirit, and a willingness to share information and science. "Everyone recognized that this is an important endeavor, and that only by a effort can we move the field forward and learn to harness the deluge of genomic data, turning it into useful information."

"For the first time we have broad insight into what works, where improvement is needed, and how we should move the field forward. We will continue running CAFA in the future, as we are confident it will only help generate better methods to understand the information locked in our genomes, and those of other organisms," Friedberg said.

The initial analysis suggests that algorithms combining disparate prediction clues taken from different knowledge-bases provide more accurate predictions. The lead methods combined data from phylogenetic, gene-expression and protein-protein interaction data to provide predictions.

Explore further: New alfalfa variety resists ravenous local pest

More information: Radivojac et al, Nature Methods: www.nature.com/nmeth/journal/v10/n3/full/nmeth.2340.html
BMC Bioinformatics companion papers: www.biomedcentral.com/bmcbioinformatics/supplements/14/S3
The Automated Function Prediction Special Interest Group web site: biofunctionprediciton.org

add to favorites email to friend print save as pdf

Related Stories

Predicting protein binding sites on DNA

Oct 15, 2012

In silico prediction of protein folding has the potential to reveal the specificity of a given protein sequence for DNA. Such methods are particularly promising as they could open the road to the rational ...

In race to predict protein structure, computers take lead

Jan 15, 2009

A flood of data is emerging from genome research, including sequence data on proteins. To help science keep pace with this flow of knowledge, computer scientists, biophysicists and biochemists across the world have been developing ...

New gene prediction method capitalizes on multiple genomes

Dec 20, 2007

Researchers at Stanford University report in the online open access journal, Genome Biology, a new approach to computationally predicting the locations and structures of protein-coding genes in a genome. Gene finding remain ...

Recommended for you

New alfalfa variety resists ravenous local pest

21 minutes ago

(Phys.org) —Cornell plant breeders have released a new alfalfa variety with some resistance against the alfalfa snout beetle, which has ravaged alfalfa fields in nine northern New York counties and across ...

New patenting guidelines are needed for biotechnology

18 hours ago

Biotechnology scientists must be aware of the broad patent landscape and push for new patent and licensing guidelines, according to a new paper from Rice University's Baker Institute for Public Policy.

Rainbow trout genome sequenced

20 hours ago

Using fish bred at Washington State University, an international team of researchers has mapped the genetic profile of the rainbow trout, a versatile salmonid whose relatively recent genetic history opens ...

User comments : 0

More news stories

In the 'slime jungle' height matters

(Phys.org) —In communities of microbes, akin to 'slime jungles', cells evolve not just to grow faster than their rivals but also to push themselves to the surface of colonies where they gain the best access ...

New alfalfa variety resists ravenous local pest

(Phys.org) —Cornell plant breeders have released a new alfalfa variety with some resistance against the alfalfa snout beetle, which has ravaged alfalfa fields in nine northern New York counties and across ...

Former Iron Curtain still barrier for deer

The Iron Curtain was traced by an electrified barbed-wire fence that isolated the communist world from the West. It was an impenetrable Cold War barrier—and for some inhabitants of the Czech Republic it ...

Rainbow trout genome sequenced

Using fish bred at Washington State University, an international team of researchers has mapped the genetic profile of the rainbow trout, a versatile salmonid whose relatively recent genetic history opens ...

Robot scouts rooms people can't enter

(Phys.org) —Firefighters, police officers and military personnel are often required to enter rooms with little information about what dangers might lie behind the door. A group of engineering students at ...