Protein misprediction uncovered by new technique

Aug 27, 2008

A new bioinformatics tool is capable of identifying and correcting abnormal, incomplete and mispredicted protein annotations in public databases. The MisPred tool, described today in the open access journal BMC Bioinformatics, currently uses five principles to identify suspect proteins that are likely to be abnormal or mispredicted.

László Patthy led a team from the Institute of Enzymology of the Hungarian Academy of Sciences, Budapest, that developed this new approach. He explained how necessary it is, "Recent studies have shown that a significant proportion of eukaryotic genes are mispredicted at the transcript level. As the MisPred routines are able to detect many of these errors, and may aid in their correction, we suggest that it may significantly improve the quality of protein sequence data based on gene predictions". The MisPred approach promises to save much time and effort that would otherwise be spent in further investigation of erroneously identified genes.

The MisPred approach rates annotations according to five dogmas:

-- Extracellular or transmembrane proteins must have appropriate secretory signals.
-- A protein with intra- and extra-cellular parts must have a transmembrane segment.
-- Extracellular and nuclear domains must not occur in a single protein.
-- The number of amino acid residues in closely related members of a globular domain family must fall into a relatively narrow range.
-- A protein must be encoded by exons located on a single chromosome.

There are some exceptions to these rules, as pointed out by Patthy, "Some secreted proteins may truly lack secretory signal peptides since they are subject to leaderless protein secretion. Similarly, it cannot be excluded at present that transchromosomal chimeras can be formed and may have normal physiological functions. Nevertheless, the fact that MisPred analyses of protein sequences of the Swiss-Prot database identified very few such exceptions indicates that the rules of MisPred are generally valid".

The authors found that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. The authors note that "Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON predicted entries".

Source: BioMed Central

Explore further: EU, others: Catch plans for Bluefin tuna threaten recovery

add to favorites email to friend print save as pdf

Related Stories

Researchers bring clean energy a step closer

2 hours ago

For nearly half a century, scientists have been trying to replace precious metal catalysts in fuel cells. Now, for the first time, researchers at Case Western Reserve University have shown that an inexpensive metal-free catalyst ...

Barclays to allow payments by using Twitter handles

2 hours ago

The next chapter in banks moving into the digital age is a stretch beyond reminding customers over phone lines that they can also bank online. Barclays has launched Twitter payments through Pingit.

Predicting human crowds with statistical physics

3 hours ago

For the first time researchers have directly measured a general law of how pedestrians interact in a crowd. This law can be used to create realistic crowds in virtual reality games and to make public spaces safer.

Recommended for you

A molecular compass for bird navigation

5 hours ago

Each year, the Arctic Tern travels over 40,000 miles, migrating nearly from pole to pole and back again. Other birds make similar (though shorter) journeys in search of warmer climes. How do these birds manage ...

Salish Sea seagull populations halved since 1980s

6 hours ago

The number of seagulls in the Strait of Georgia is down by 50 per cent from the 1980s and University of British Columbia researchers say the decline reflects changes in the availability of food.

Cultivation of microalgae via an innovative technology

6 hours ago

Preliminary laboratory scale studies have shown consistent biomass production and weekly a thick microalgal biofilm could be harvested. A new and innovative harvesting device has been developed for ALGADISK able to directly ...

User comments : 0

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.