New gene prediction method capitalizes on multiple genomes

Dec 20, 2007

Researchers at Stanford University report in the online open access journal, Genome Biology, a new approach to computationally predicting the locations and structures of protein-coding genes in a genome. Gene finding remains an important problem in biology as scientists are still far from fully mapping the set of human genes.

Furthermore, gene maps for other vertebrates, including important model organisms such as mouse, are much more incomplete than the human annotation. The new technique, known as CONTRAST (CONditionally TRAined Search for Transcripts), works by comparing a genome of interest to the genomes of several related species.

CONTRAST exploits the fact that the functional role protein-coding genes play a specific part within a cell and are therefore subjected to characteristic evolutionary pressures. For example, mutations that alter an important part of a protein's structure are likely to be deleterious and thus selected against. On the other hand, mutations that preserve a protein's amino acid sequence are normally well tolerated. Thus, protein-coding genes can be identified by searching a genome for regions that show evidence such patterns of selection. However, learning to recognize such patterns when more than two species are compared has proved difficult.

Previous systems for gene prediction were able to effectively make use of one additional 'informant' genome. For example, when searching for human genes, taking into account information from the mouse genome led to a substantial increase in accuracy. But, no system was able to leverage additional informant genomes to improve upon state-of-the-art performance using mouse alone, although it was expected that adding informants would make patterns of selection clearer.

CONTRAST solves this problem by learning to recognize the signature of protein-coding gene selection in a fundamentally different way from previous approaches. Instead of constructing a model of sequence evolution, CONTRAST directly 'learns' which features of a genomic alignment are most useful for recognizing genes. This approach leads to overall higher levels of accuracy and is able to extract useful information from several informant sequences.

In a test on the human genome, CONTRAST exactly predicted the full structure of 59% of the genes in the test set, compared with the previous best result of 36%. Its exact exon sensitivity of 93%, compared with a previous best of 84%, translates into many thousands of exons correctly predicted by CONTRAST but missed by previous methods. Importantly, CONTRAST's accuracy using a combination of eleven informant genomes was significantly higher than its accuracy using any single informant. The substantial advance in predictive accuracy represented by CONTRAST will further efforts to complete protein-coding gene maps for human and other organisms.

Further information about existing gene-prediction methods and the advance CONTRAST brings to the field can be found in a minireview by Paul Flicek, which accompanies the article by Batzoglou and colleagues.

Source: BioMed Central

Explore further: New technology maps human genome in days

Related Stories

Highly efficient CRISPR knock-in in mouse

May 01, 2015

Genome editing using CRISPR/Cas system has enabled direct modification of the mouse genome in fertilized mouse eggs, leading to rapid, convenient, and efficient one-step production of knockout mice without ...

The evolutionary secrets of the brachiopod shell

Apr 30, 2015

Researchers of Ludwig-Maximilians-Universitaet (LMU) in Munich have carried out the first detailed study of the molecular mechanisms responsible for formation of the brachiopod shell. Comparison with shell synthesis in other ...

Viruses: You've heard the bad—here's the good

Apr 30, 2015

"The word, virus, connotes morbidity and mortality, but that bad reputation is not universally deserved," said Marilyn Roossinck, PhD, Professor of Plant Pathology and Environmental Microbiology and Biology at the Pennsylvania ...

How an RNA gene silences a whole chromosome

Apr 27, 2015

Researchers at Caltech have discovered how an abundant class of RNA genes, called long non-coding RNAs (lncRNAs, pronounced link RNAs) can regulate key genes. By studying an important lncRNA, called Xist, ...

Improving accuracy in genome editing

Apr 23, 2015

Imagine a day when scientists are able to alter the DNA of organisms in the lab in the search for answers to a host of questions. Or imagine a day when doctors treat genetic disorders by administering drugs ...

Recommended for you

Producing jet fuel compounds from fungus

1 hour ago

Washington State University researchers have found a way to make jet fuel from a common black fungus found in decaying leaves, soil and rotting fruit. The researchers hope the process leads to economically ...

New technology maps human genome in days

2 hours ago

The two 3-by-1-inch glass chips held the unfathomable amount of genetic information contained in 16 human genomes. Last week, a technician placed the chips - called flow cells - in a new genetic sequencing ...

Just like humans, dolphins have social networks

6 hours ago

They may not be on Facebook or Twitter, but dolphins do, in fact, form highly complex and dynamic networks of friends, according to a recent study by scientists at Harbor Branch Oceanographic Institute (HBOI) ...

Norway plans to slash subsidies to fur farms

6 hours ago

Norwegian fur farmers denounced Tuesday a government proposal to slash financial support to the controversial industry and warned that it could lead to farm closures in vulnerable rural areas.

User comments : 0

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.