Human beings are ecosystems on two legs, each of us carrying enough microbes to outnumber our human cells by 10 to 1 and our genes by even more. Identifying the dizzying numbers of bacteria and other microbes that live in and on our bodies is like exploring a new planet. You need much more than telescopes and charts to map the unknown territory called our microbiomes and explorers to take a census of the inhabitants.
The Human Microbiome Project (HMP) Consortium, a five-year collaboration of large sequencing centers including the Broad Institute and dozens of other multidisciplinary research institutions across the nation, set out to build a foundation for understanding how these microorganisms coexist with their human hosts. To do so, scientists at the Broad played a leadership role in creating molecular tools and applying standardized protocols, generating vast amounts of data, developing new analytical methods to understand these data, and identifying the microbiome's most elusive organisms for whole genome sequencing.
Now a series of 14 scientific publications from the consortium, two appearing June 14 in Nature and 12 in Public Library of Science journals, for the first time answer two fundamental questions about the microbiota that healthy humans carry: Who's there and what are they doing?
"Just as the Human Genome Project was 10 years ago, the Human Microbiome Project is intended to be a baseline for future studies of human health and disease," said Dirk Gevers, a Genome Sequencing and Analysis Program group leader at the Broad, co-first author of one Nature paper, and co-author of the other. "This is a tremendous resource that is now publicly available to the scientific community that allows us to ask how and why microbial communities vary."
For the effort, samples were obtained from up to 18 different body sites targeting five main body areas, including airways, skin, oral cavity, digestive tract, and vagina. These samples were donated by 242 healthy people ranging from 18 to 40 years old and living around Houston or St. Louis.
The composition of these microbial communities, or microbiota, was surprisingly diverse and abundant. In addition, microbes can vary widely not just from site to site on a single person but also from person to person, with certain body sites being more predictable than others. Although many different bacteria populate the saliva in one person's mouth, people living in the same community had similar kinds of microbes in their saliva. In contrast, the bacteria found on skin showed greater differences between people but only moderate variety within a single individual. Ethnic/racial background provided the strongest association for diversity in this snapshot of the Western human microbiome. Interestingly, despite differences in the microbes present on the same body site among different people, the overall collection of organisms performed similar metabolic tasks, such as breaking down energy sources.
Traditionally, scientists have studied bacteria by culturing them in laboratory dishes, which is known to miss less abundant and hard-to-grow bacteria. Instead the HMP scientists isolated DNA directly from the more than 5,000 samples collected from the healthy subjects. Their first goal was to determine what microbes were present and how many of them there were. They did this by sequencing the 16S ribosomal RNA gene, a specific gene shared by all bacteria but not by humans, providing the equivalent of a barcode that can be used to identify and count the microbes present.
Once that daunting challenge was met, they would also sequence all of the DNA from all of the bacteria in a sample to create what is known as a metagenome sequence, based on data from millions of small sequences. By sequencing all the genes in a microbial community in this way, the scientists could decipher their metabolic capabilities.
But before setting any of these steps in motion, the collaborators had to test their methods and agree on protocols that would ensure their results were free of center-specific differences and therefore usable by the wider scientific community.
Broad scientist Georgia Giannoukos and her team developed protocols that would allow the project to efficiently and accurately sequence the thousands of samples being collected. These processes were adopted by the other three sequencing centers participating in the project, Baylor College of Medicine, J. Craig Venter Institute, and Washington University School of Medicine in St. Louis. Broad scientists were also instrumental in establishing protocols and working with the other centers to gain a consensus on how to sequence sometimes vanishingly small amounts of DNA.
Once these milestones were reached, new analytical methods were needed for deciphering these large data sets. When sequencing the unknown, it's sometimes difficult to tell the good from the bad, the real from the artifact. For example, Broad scientist Brian Haas created a tool to recognize a type of sequence errors called chimeras. Chimeras are created when sequences from different organisms get artificially joined together, and if not identified they can be mistaken for signatures of novel microbes.
In addition, the microbiome metagenome is thought to contain 8 million different protein-coding genes, or 360 times as many as found in the human genome, which contains a relatively slight 22,000 protein-coding genes. New methods were needed to assign identity and function to these metagenome data.
"Big science is hard. Metagenomic science is hard. And metagenomic, big science is the square of hard," said Doyle Ward, a scientist at the Broad Institute and co-author of both Nature papers. "We were sequencing large numbers of highly variable, highly complex samples at different institutions and there's no roadmap for doing that."
Associate member and co-first author Curtis Huttenhower of the Harvard School of Public Health made significant contributions in deciphering complex metagenomes by building analytical tools, many of which relied upon information gathered from whole-genome sequences or 'reference genomes' from representative members of the microbiome. Such a collection of reference genomes from human-associated microbes has been actively assembled by the HMP, including the Broad, since project inception in 2008.
However, metagenomic approaches revealed that the reference collection was incomplete and allowed scientists to identify and pursue microbes that were so distinct from already sequenced organisms that no one had previously identified or characterized even their genetic relatives. Ashlee Earl, a scientist at the Broad and a co-author of both Nature papers, led the search for these "most wanted" members of the microbiome. She and her colleagues used two criteria: The bacteria had to be found in 20 percent of the people in the study and they had to be truly novel, based on their 16S barcode, which revealed how different their sequence might be from known microbes.
Now that the scientists narrowed the priority list to a mere 119 microbes, they can embark on using novel culture- and single-cell based methods to capture these most wanted organisms for whole-genome sequencing.
"Our analysis suggested that the diversity of bacterial life living in and on us might be approachable," Earl said, referring to the relatively small number of organisms needing sequencing to round out the picture of healthy human-associated bacterial life. "Having genomes from these previously unknown, but frequent members of the microbiome will help us to understand what these organisms might be doing. Are they friend? Are they foe? Can they resist antibiotics? Do they produce toxins or molecules that could be beneficial to us?"
Once that foundation was built, exploration accelerated.
"Although everybody understands that our microbes make important contributions to our own biology, this is the first time we've attempted a comprehensive survey of what's there," said Bruce Birren, director of the Broad's Genomic Sequencing Center for Infectious Diseases, co-director of the Broad's Genome Sequencing and Analysis Program and co-author of both Nature papers. "We had no idea which microbes are present, how many they were, and how people differ from each other in their microbiomes. These studies not only answered those questions but allowed us to go further to ask what these microbes do, and how many of them represent new organisms and brand new biology. These are the things we need to know to understand the interaction of the microbes and humans that is so important."
While the study subjects were healthy, their microbiomes were inhabited by some bacteria that can cause disease. As expected, Staphylococcus aureus, strains of which are linked to the drug-resistant infection called MRSA, was found in the noses of about 30 percent of the people. Most of the time, these opportunistic pathogens appear to coexist peacefully with humans, but scientists hope to understand what factors can tip them toward disease. A healthy microbiome may prevent infection.
People have co-evolved with microorganisms that help them survive and function. Now the HMP has produced a catalog of those organisms.
"The next frontier is to harness this information by embarking on studies that explore how the microbiome affects illnesses such as inflammatory bowel disease or type 1 diabetes, exploring options for early diagnostics and even therapeutic manipulation of the microbiota," said Gevers.
Explore further: Wound healing: 'See-saw' switch sends cells on the march
More information: Huttenhower C, Gevers D et al., Structure, function and diversity of the healthy human microbiome. Nature doi:10.1038/nature11234
Methe BA et al., A framework for human microbiome research. Nature doi:10.1038/nature11209
Ward D, Gevers D et al. Evaluation of 16S rDNA-based community profiling for human microbiome research PLoS ONE, 10.1371/journal.pone.0039315
Sahar A et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Computational Biology, 10.1371/journal.pcbi.1002358
Haas, BJ, Gevers D, Earl, A, et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons Genome Research doi: 10.1101/gr.112730.110