The human genome has been mapped. Now, it's on to proteins, a much more daunting task. There are 20,300 genes, but there are millions of distinct protein molecules in our bodies. Many of these hold keys to understanding disease and targeting treatment.
A team led by Northwestern University chemical biologist Neil Kelleher has developed a new "top-down" method that can separate and identify thousands of protein molecules quickly. Many have been skeptical that such an approach, where each protein is analyzed intact instead of in smaller parts, could be done on such a large scale.
The promise of a top-down strategy is that the molecular data scientists do collect will be more closely linked to disease.
"Accurate identification of proteins could lead to the identification of biomarkers and early detection of disease as well as the ability to track the outcome of treatment," Kelleher said. "We are dramatically changing the strategy for understanding protein molecules at the most basic level. This is necessary for the Human Proteome Project -- the mapping of all healthy human proteins in tissues and organs -- to really take off."
Kelleher is the Walter and Mary E. Glass Professor of Molecular Biosciences and professor of chemistry in the Weinberg College of Arts and Sciences. He also is director of the Proteomics Center of Excellence and a member of the Robert H. Lurie Comprehensive Cancer Center of Northwestern University.
Kelleher says his approach is conceptually simple. "We take proteins -- those swimming around in cells -- and we measure them," he said. "We weigh proteins precisely and identify them directly. The way everyone else is doing it is by digesting the proteins, cutting them up into smaller bits called peptides, and putting them back together again. I call it the Humpty Dumpty problem."
The new strategy, Kelleher says, solves the "protein isoform problem" of the "bottom-up" approach where the smaller peptides often do not map cleanly to single human genes. The study will be published Oct. 30 by the journal Nature.
The top-down method can accurately identify which gene produced which protein. The bottom-up method is only 60 to 90 percent accurate in identifying proteins precisely.
"We need to define all the protein molecules in the human body," Kelleher said. "First, we need a map of healthy protein forms, which will become a highly valuable reference list for understanding damaged and diseased forms of proteins. Our technology should allow us to get farther down this road faster."
In the first large-scale demonstration of the top-down method, the researchers were able to identify more than 3,000 protein forms created from 1,043 genes from human HeLa cells.
Their goal was to identify which gene each protein comes from -- to provide a one-to-one picture. They were able to produce this accurate map of thousands of proteins in just a few months.
The researchers also can produce the complete atomic composition for each protein. "If a proton is missing, we know about it," Kelleher said.
One gene they studied, the HMGA1 gene associated with premature aging of cells, produces about 20 different protein forms.
Kelleher's team developed a four-dimensional separation system that uses separations and mass spectrometry to measure the charge, mass and weight of each protein as well as how "greasy" a protein is. The software the researchers developed to analyze the data during years of work prior to the study proved critical to the success of the top-down method.
"If you want to know how the proteins in cancer really work and change, top-down mass spectrometry is getting to the point where it can be part of the discussion," Kelleher said.
"Analyzing the entire set of proteins expressed in a cell presents a continuing and significant technical challenge to the field of proteomics," said Charles Edmonds, who oversees proteomics grants at the National Institute of General Medical Sciences of the National Institutes of Health. "By combining multiple fractionation technologies with mass spectrometry, Dr. Kelleher and colleagues have demonstrated more than an order of magnitude improvement in proteome coverage. This is a great start."
Explore further: The most complete review of the peptide behind Alzheimer's
More information: The title of the paper is "Mapping Intact Protein Isoforms in Discovery Mode Using Top-Down Proteomics."