August 4, 2011

Clustering is key to lighting up the dark proteome

by Pacific Northwest National Laboratory

(PhysOrg.com) -- A new approach that organizes previously unused mass spectra from proteomics studies gives scientists the ability to use these spectra to gain more information about proteins in a wide range of organisms. Scientists from the University of California-San Diego and Pacific Northwest National Laboratory have created a vast spectral archive from more than a billion mass spectra acquired at PNNL between 2001 and 2009. They describe their approach in the July issue of Nature Methods.

In recent years, the volume of tandem mass spectrometry data generated from proteomics experiments has increased dramatically. Multiple, nearly identical mass spectra of the same peptides are routinely measured by various laboratories. Scientists compare the spectra with peptides residing in a database of known protein sequences. They then evaluate the resulting matches using various scoring methods to assign an identity to the peptide spectrum. Large sets of spectra can be organized into spectral libraries where other spectra can be brought for comparison, leading to increasing effectiveness in peptide assignments used for protein identifications.

But what about those spectra not identified; that is, those not associated with a known peptide? Typically, unidentified spectra are ignored or discarded, as they have limited value to the researchers because the protein is unidentified. As a result, a significant fraction of the proteins remain unidentified, constituting an effective "dark proteome" of unknown content.

Shedding light on the dark proteome is where the UCSD/PNNL team comes in. While spectral libraries discard unidentified spectra, spectral archives use all mass spectra—identified or unidentified-as clusters (see "Spectral Archives Complement Spectral Libraries"). The scientists not only showed the feasibility of constructing large archives and their basic utility for run-of-the-mill peptide identification, they developed new applications now possible because a diverse collection of datasets can be analyzed as a whole.

"We believe that spectral archives could change the nature of proteomics by motivating researchers who are analyzing seemingly unrelated data to share this data," said senior author Dr. Pavel Pevzner, UCSD. "Doing so improves the quality of the interpretations of both of their spectral datasets."

With archives, a researcher can identify clusters of spectra from different organisms. Besides indicating that such spectra are interesting—as they are likely to indicate proteins occurring over multiple species—this fact can be used to reduce the effective protein database size, leading to new, confident peptide and protein identifications. The team also showed that short peptides (shorter than 7 amino acids) could be confidently identified, which is much more difficult with typically used approaches.

The PNNL mass spectra data used by the team included samples taken from a diverse set of more than 100 organisms, including humans, the common house mouse, and the metal-reducing bacterium Shewanella oneidensis. The research team developed a clustering tool, MS-Cluster, that generated a spectral archive from the ~1.18 billion spectra from PNNL. This archive greatly exceeds the size of existing spectral repositories.

To evaluate whether spectral archives can increase peptide identifications, the researchers selected a subset of 14.5 million spectra from the microorganism S. oneidensis and constructed an archive with them. They did this by breaking the dataset into five sets of ~2.9 million spectra then incrementally adding each set of spectra to the archive. At each stage they compared the number of protein and unique peptide identifications made by searching the clusters in the archive with the number that could be obtained with conventional database search approaches.

The archive consistently yielded more unique peptide and protein identifications. With the archive, the scientists also were able to identify many more spectra through their cluster membership. At different stages, they identified 50-75% more spectra through cluster membership than via a regular database search.

This study also highlights the large number of spectra for which peptide and protein identifications are not achieved, opening the door for use of experimental and computational approaches to identify the significant numbers of peptides effectively ignored by proteomics studies to date.

More information: Frank AM, et al. 2011. "Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra." Nature Methods 8(7):587-591. DOI:10.1038/nmeth.1609

Provided by Pacific Northwest National Laboratory

Citation: Clustering is key to lighting up the dark proteome (2011, August 4) retrieved 11 July 2024 from https://phys.org/news/2011-08-clustering-key-dark-proteome.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Sequenced genomes make good neighbors

0 shares

Feedback to editors

Clustering is key to lighting up the dark proteome

Air pollution harms pollinators more than pests, study finds

Hexagonal metallic-mean approximants help bridge gap between quasicrystals and modulated structures

Opening the right doors: New work reveals 'jumping gene' control mechanisms

Researchers develop model to study heavy-quark recombination in quark-gluon plasma

A new species of extinct crocodile relative rewrites life on the Triassic coastline

New method achieves tenfold increase in quantum coherence time via destructive interference of correlated noise

Mars likely had cold and icy past, new study finds

Study: Nanoparticle vaccines enhance cross-protection against influenza viruses

New tools are needed to make water affordable, says study

Researchers demonstrate how to build 'time-traveling' quantum sensors

Relevant PhysicsForums posts

Hydrochloric Acid, NaOH, and English Ivy

Endothermic crystallization

Storing chemicals on my balcony (storing in changing temps)

Order of Reactions occurring in aqueous solutions

Gibbs energy for Lithiation in Lithium batteries

Diamond oxidation -- covalent bonds

Sequenced genomes make good neighbors

NASA Reveals Key to Unlock Mysterious Red Glow in Space

Research team fully maps human proteome

New ink sampling technique taking a bite of out time

Sandia to Demonstrate Hyperspectral Confocal Fluorescence Microscope

Discrepant features found in cosmic ray energy spectra

First chemist in history may have been a female perfumer—how the science of scents has changed since

Chemist explores the real-world science of Star Wars

Some plant-based steaks and cold cuts are lacking in protein, researchers find

Scientists develop new machine learning method for modeling chemical reactions

Trio wins Nobel Prize in chemistry for work on quantum dots, used in electronics and medical imaging

Researchers create 3D-printed vegan seafood

Medical Xpress

Tech Xplore

Science X

Clustering is key to lighting up the dark proteome

Air pollution harms pollinators more than pests, study finds

Hexagonal metallic-mean approximants help bridge gap between quasicrystals and modulated structures

Opening the right doors: New work reveals 'jumping gene' control mechanisms

Researchers develop model to study heavy-quark recombination in quark-gluon plasma

A new species of extinct crocodile relative rewrites life on the Triassic coastline

New method achieves tenfold increase in quantum coherence time via destructive interference of correlated noise

Mars likely had cold and icy past, new study finds

Study: Nanoparticle vaccines enhance cross-protection against influenza viruses

New tools are needed to make water affordable, says study

Researchers demonstrate how to build 'time-traveling' quantum sensors

Relevant PhysicsForums posts

Related Stories

Sequenced genomes make good neighbors

NASA Reveals Key to Unlock Mysterious Red Glow in Space

Research team fully maps human proteome

New ink sampling technique taking a bite of out time

Sandia to Demonstrate Hyperspectral Confocal Fluorescence Microscope

Discrepant features found in cosmic ray energy spectra

Recommended for you

First chemist in history may have been a female perfumer—how the science of scents has changed since

Chemist explores the real-world science of Star Wars

Some plant-based steaks and cold cuts are lacking in protein, researchers find

Scientists develop new machine learning method for modeling chemical reactions

Trio wins Nobel Prize in chemistry for work on quantum dots, used in electronics and medical imaging

Researchers create 3D-printed vegan seafood

Newsletter sign up

Donate and enjoy an ad-free experience