MaxBin: Automated sorting through metagenomes
Microbes – the single-celled organisms that dominate every ecosystem on Earth - have an amazing ability to feed on plant biomass and convert it into other chemical products. Tapping into this talent has the potential to revolutionize energy, medicine, environmental remediation and many other fields. The success of this effort hinges in part on metagenomics, the emerging technology that enables researchers to read all the individual genomes of a sample microbial community at once. However, given that even a teaspoon of soil can contain billions of microbes, there is a great need to be able to cull the genomes of individual microbial species from a metagenomic sequence.
Enter MaxBin, an automated software program for binning (sorting) the genomes of individual microbial species from metagenomic sequences. Developed at the U.S. Department of Energy (DOE)'s Joint BioEnergy Institute (JBEI), under the leadership of Steve Singer, who directs JBEI's Microbial Communities Group, MaxBin facilitates the genomic analysis of uncultivated microbial populations that can hold the key to the production of new chemical materials, such as advanced biofuels or pharmaceutical drugs.
"MaxBin automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads," says Singer, a chemist who also holds an appointment with Berkeley Lab's Earth Sciences Division. "Previous binning methods either required a significant amount of work by the user, or required a large number of samples for comparison. MaxBin requires only a single sample and is a push-button operation for users."
The key to the success of MaxBin is its expectation-maximization algorithm, which was developed by Yu-Wei Wu, a post-doctoral researcher in Singer's group. This algorithm enables the classification of metagenomic sequences into discrete bins that represent the genomes of individual microbial populations within a sample community.
"Using our expectation-maximization algorithm, MaxBin combines information from tetranucleotide frequencies and scaffold coverage levels to organize metagenomic sequences into the individual bins, which are predicted from an initial identification of marker genes in assembled sequences," Wu says.
MaxBin was successfully tested on samples from the Human Microbiome Project and from green waste compost. In these tests, which were carried out by Yung-Tsu Tang, a student intern from the City College of San Francisco, MaxBin proved to be highly accurate in its ability to recover individual genomes from metagenomic datasets with variable sequencing coverages.
"Applying MaxBin to an enriched cellulolytic consortia enabled us to identify a number of uncultivated cellulolytic bacteria, including a myxobacterium that possesses a remarkably reduced genome and expanded set of genes for biomass deconstruction compared to its closest sequenced relatives," Singer says. "This demonstrates that the processes required for recovering genomes from metagenomic datasets can be applied to understanding biomass breakdown in the environment".
MaxBin is now being used at JBEI in its efforts to use microbes for the production of advanced biofuels – gasoline, diesel and jet fuel – from plant biomass. MaxBin is also available for downloading at http://downloads.jbei.org/data/MaxBin.html. To date, more than 150 researchers have accessed it.
A paper describing MaxBin in detail has been published in the journal Microbiome. The paper is titled "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm." Co-authoring this paper in addition to Singer, Wu and Tang, were Susannah Tringe of the Joint Genome Institute, and Blake Simmons of JBEI.