Darwin's famous finches and Venter's marine microbes

Mar 13, 2007

Although the Galápagos finches were to play a pivotal role in the inception of Darwin’s theory of evolution through natural selection, he had no inkling of their significance when he collected them during his voyage on the HMS Beagle.

Similarly, it is hard to predict the impact the vast amount of marine microbial DNA – collected during the Sorcerer II Global Ocean Sampling Expedition by J. Craig Venter, Ph.D., and his team – will have on our understanding of the natural world.

"If anything, this is just the beginning," says Gerard Manning, Ph.D., director of the Razavi Newman Center for Bioinformatics at the Salk Institute for Biological Studies. "We’re starting to explore this trove of sequences now, but it may be decades before we fully understand it all."

Just like the famous ornithologist John Gould who had to classify the Galápagos finches before they led Darwin on the right track, Manning and many others have been busy during the last couple of months wading through roughly 7.7 million sequenced snippets of sea-borne genomic DNA to impose order on the flood of data and to classify the identified proteins.

Their findings are detailed in series of papers, published in this week’s online edition of the journal Public Library of Science Biology.

The authors are plying the rapidly emerging trade of metagenomics (also known as environmental genomics) that seeks to examine genomic snapshots taken directly from the environment.

"Metagenomics allows us to sample the 99 percent of all bacteria that won’t grow in the lab," explains Manning. "GOS opens a huge window into biological and genomic diversity and, within this diversity, to better understand many of the fundamentals of biology." he adds.

Expanding the universe of protein families

But instead of whole genomes, metagenomics produces a whole grab bag of bits and pieces for which scientists have to develop new methods to extract meaning. In one of the papers, an array of scientists, spearheaded by first author Shibu Yooseph, Ph.D., and his colleagues at the Craig Venter Institute, compared every DNA fragment with every other available DNA fragment to produce clusters of related sequences. This exhaustive analysis predicted more than 6 million proteins in the GOS data – nearly twice the number of all proteins ever described before – and laid the groundwork for further studies.

Manning, a co-author on Yooseph’s paper, looked at the other side of the coin. He ran all the public sequences and GOS data against Pfam, a collection of signature profiles for all known protein families. Each of these profiles is an average of all known members of a certain protein family.

"Instead of starting with a human kinase to find a bacterial kinase, for example, you start with all of them together, which makes the search much more sensitive, but also very computationally expensive," Manning says. "We did almost 350 million comparisons, which is probably an order of magnitude or two more than anybody has ever done before."

Manning and co-author Yufeng Zhai, Ph.D., a bioinformatics programmer in the Razavi Newman Center for Bioinformatics at the Salk, could only accomplish this rather gargantuan task with the help of Time Logic, a company in Carlsbad, California. The company specializes in hardware that accelerates genomic searches. "We only have one of their accelerators, but Time Logic stepped up and lent us eight more," says Manning. The final computation took two weeks, but would have taken well over a century on a traditional computer.

The Salk scientists could assign over half of all GOS sequences to known protein families, and discovered that certain protein profiles are more popular in the ocean or on land. For example, gram-positive bacteria are best known for their hardy spores, but this ability has been entirely lost in their marine relatives.. Flagella, whip-like extensions propelling bacteria forward and pili, short extensions used to exchange genetic material between bacteria (also known as microbial sex), are also less frequent in marine environments.

"By comparing our findings with the Yooseph clusters, we also discovered hundreds of new gene families that hadn’t even been seen before," says Zhai and adds that by adding the diverse GOS data to known profiles, "we were able to make them more sensitive and diverse, and so increase their power to categorize novel sequences."

Diversity of microbial kinases

In a separate study, Manning, Zhai, and first author Natarajan Kannan, Ph.D., a postdoctoral researcher in the lab of HHMI investigator and UCSD professor Susan S. Taylor, Ph.D., traded the breadth of the ocean survey for the depth of a single protein domain. They zoomed in on kinases, extremely well studied enzymes, which control every aspect of eukaryotic cell biology and are important cancer drug targets. They control the activity of proteins and small molecules by attaching tiny phosphate groups to them. By contrast, much less has been known about their bacterial counterparts.

Again and again, the researchers combed the GOS data for bacterial kinases, each time rebuilding their domain profiles by including the new members found in the previous round. All in all, they dug up 45,000 protein kinase sequences that fell into 20 distinct families, of which the
eukaryotic protein kinases are just one. The additional 19 families spanned a huge range and included several that had never been described before.

"Prokaryotic protein-like kinases were considered to be some sort of niche players, but actually they are more prevalent and widespread than histidine kinases," explains Manning. Bacteria were thought to rely mostly on histidine kinases, which are structurally different from protein kinases, for all their signaling needs.

Even though the different kinase families had very little similarity in their sequence, it emerged that 10 key residues were conserved in almost all kinase families, fingering them as being at the core of what it means to be a kinase. Seven of those had been previously known to be important in human kinases, but the other three were unexpected finds.

The other surprising finding was just how innovative and plastic the different families were, even with these core residues, as one or another family had found ways to eliminate any but one of the 10 key residues. Using structural modeling, and patterns of sequence conservation, Kannan was able to show that loss of one key residue could be compensated by changes around other conserved regions of the protein, and that some of these changes in bacterial kinases are also seen in specific human kinases.

Says Manning, "By looking at all these very distant microbial relatives we can understand more even about human kinases and their relationship to cancer and other diseases. We go out into the ocean, we find all this diversity and analyzing what’s new and what’s not new reflects back on the things we thought we knew well."

Research done at the Salk Institute was supported by the Razavi-Newman Foundation.

Source: Salk Institute

Explore further: Sugar mimics guide stem cells toward neural fate

add to favorites email to friend print save as pdf

Related Stories

Hoverbike drone project for air transport takes off

7 hours ago

What happens when you cross a helicopter with a motorbike? The crew at Malloy Aeronautics has been focused on a viable answer and has launched a crowdfunding campaign to support its Hoverbike project, "The ...

Study indicates large raptors in Africa used for bushmeat

7 hours ago

Bushmeat, the use of native animal species for food or commercial food sale, has been heavily documented to be a significant factor in the decline of many species of primates and other mammals. However, a new study indicates ...

'Shocking' underground water loss in US drought

7 hours ago

A major drought across the western United States has sapped underground water resources, posing a greater threat to the water supply than previously understood, scientists said Thursday.

Recommended for you

Sugar mimics guide stem cells toward neural fate

9 minutes ago

Embryonic stem cells can develop into a multitude of cells types. Researchers would like to understand how to channel that development into the specific types of mature cells that make up the organs and other structures of ...

Chinese mosquitos on the Baltic Sea

17 minutes ago

The analysis of the roughly 3,000 pieces is still in its infant stage. But it is already evident that the results will be of major significance. "Amazingly often, we are finding–in addition to Asian forms–the ...

Baby zebra is latest success in research partnership

1 hour ago

The recent birth of a female Grevy's zebra foal at the Saint Louis Zoo marks another milestone in a long-running Washington University in St. Louis research partnership that is making significant contributions ...

User comments : 0