A first draft of the cacao genome is complete, a consortium of academic, governmental, and industry scientists announced today. Indiana University Bloomington scientists performed much of the sequencing work, which is described and detailed at http://www.cacaogenomedb.org/, the official website of the Cacao Genome Database project.
Despite being led and funded by a private company, Mars Inc., Cacao Genome Database scientists say one of their chief concerns has been making sure the Theobroma cacao genome data was published for all to see -- especially cacao farmers and breeders in West Africa, Asia and South America, who can use genetic information to improve their planting stocks and protect their often-fragile incomes.
"When you have to wait three or more years for a tree you plant to bear the beans you sell, you want as much information as possible about the seedlings you're planting," said Keithanne Mockaitis, IU Center for Genomics and Bioinformatics (CGB) sequencing director and IU project leader. "We expect this information will positively impact some of the poorest regions in the world, where tropical tree crops are grown. Making the genome data public further enables breeders, farmers and researchers around the world to use a common set of tools, and to share information that will help them fight the spread of disease in their crops."
Mockaitis, a biochemist-turned-genomicist, joined the project in early 2009, and quickly set to work with her collaborators to tackle the challenge of sequencing and accurately pasting together the approximately 400 million base pairs of the tree's genome. Mockaitis' Cacao Genome Group partners at the U.S. Department of Agriculture's Subtropical Horticulture Research Station in Miami sent samples to Bloomington, and these were prepared and sequenced in a redundant manner by her sequencing team in the CGB genomics laboratory. Sequence of some of the same material was generated using additional methods in laboratories of the USDA Agricultural Research Service (USDA-ARS) and at the National Center for Genome Resources in Santa Fe, N.M.
Raw data were then sent to HudsonAlpha Biotechnology Institute, a partner of the U.S. Department of Energy-funded Joint Genome Institute, for assembly. Other important datasets generated by Mockaitis' group were not the sequences of the DNA itself, but of the RNA, or transcripts produced in different tissues of the tree. Transcript sequences reveal which genes are expressed (turned on).
Finally, IU Bloomington Department of Biology scientist Don Gilbert analyzed both the genome and transcriptome sequences and generated the annotations that point to the locations in which each active gene and its components (exons and introns) reside.
"The final number of genes is still being counted and validated, but we currently estimate the cacao plant has about 35,000 genes," Mockaitis said.
That's a typical gene number for flowering plants whose genomes have thus far been sequenced. Humans have approximately 30,000 genes. Rice has about 40,000.
Since its inception about 11 years ago, the CGB has been involved in dozens of different projects that address the workings of different species' genomes with the use of high-throughput technologies.
"Cacao is something of a first for us," Mockaitis said. "This is the largest genome the CGB has sequenced to date. As a group we now have more experience and more resources to take on a wider variety of projects."
Mockaitis says the relative efficiency of the project so far has been due to Mars' support of the academic and non-profit contributing laboratories.
"We've benefited from having a collegial group of researchers, from the USDA-ARS and a variety of genomics-focused laboratories, that each bring different scientific expertise to the table to complete this genome. It's also been particularly inspiring to see West African cacao researchers come to some of our meetings -- they listen to us talk about the esoteric technologies we're using, and we know that they'll soon go to work and start benefitting from the data. That's a rare treat for an academic researcher."
Mockaitis was first introduced to this project through Roche Diagnostics, based in Indianapolis, which owns the 454 Sequencing technology. Her group had developed improved methods for the sequencing of transcripts (active gene products, above), and was asked to contribute some data to the project. Since then the IU CGB has been able to contribute to the sequence of the genome itself as well.
Unlike some other food products, such as corn or wheat, which are often grown on large, industrial farms, cacao is almost exclusively grown on small farms. There are about 6.5 million chocolate farmers around the world, primarily in West Africa, northern South America, and Southeast Asia. The United States produces virtually no chocolate on its own, instead opting to engage cacao-growing countries with economic policies that support the production and trade of what may be the world's most popular food.
"Genome sequencing helps eliminate much of the guess-work of traditional crop cultivation," said Howard-Yana Shapiro, global staff officer of plant science and research at Mars Inc. "Cocoa is what some researchers describe as an 'orphan crop' because it has been the subject of little agricultural research compared to corn, wheat and rice. This effort, which will allow fast and accurate traditional breeding, is about applying the best of what science has to offer in taking an under-served crop and under-served population and giving them both the chance to flourish."
Mockaitis says she hopes the project will have a positive impact on the farmers' lives and livelihoods.
"It is an export crop that can reduce poverty," she said. "I believe the work our groups have done will eventually help small farmers stay in business over time, because improved breeding programs based on reliable genome data will give them plants naturally equipped to fight off disease and to thrive in their specific location. This will lead to more sustainable crops and of course a more stable chocolate supply for all of us -- pretty important!"
Explore further: Computational method dramatically speeds up estimates of gene expression