Machine learning unlocks plants' secrets
Plants are master chemists, and Michigan State University researchers have unlocked their secret of producing specialized metabolites.
The research, published in the latest issue of Proceedings of the National Academy of Sciences, combined plant biology and machine learning to sort through tens of thousands of genes to determine which genes make specialized metabolites.
Some metabolites attract pollinators while others repel pests. Ever wonder why deer eat tulips and not daffodils? It's because daffodils have metabolites to fend off the critters who'd dine on them.
The results could potentially lead to improved plants but also to the development of plant-based pharmaceuticals and environmentally safe pesticides, said Shin-Han Shiu, an MSU plant computational biologist.
"Plants are amazing – they are their own mini factories, and we wanted to recreate what they do in a lab to produce synthetic chemicals to make drugs, disease-resistant crops and even artificial flavors," Shiu said. "Our research found that it is possible to pick out the right gene by automating the process since machines are more capable of picking out minute differences among thousands of genes."
Taking a machine-learning approach, an interdisciplinary team of biochemists and computational biologists created a model that looked at more than 30,000 genes in Arabidopsis thaliana, a small flowering plant that is called the "lab rat of plant science."
The model is based on technology used by e-commerce to forecast consumer behavior and create targeted advertising, such as ads seen on a person's Facebook page. Basically, the technology sorts through thousands of ads based on your previous online behavior to send you select ads geared toward your interests and activities.
In the plant study, scientists wrote a program that sorted through 30,000 genes to hone in on the ones related to making specialized metabolites.
"Machine learning was a novel approach for us in plant biology, a new application of tools widely used in other fields," Shiu said. "The model we created with machine learning can now be applied to other plant species that produce medicinally or industrially useful compounds to speed up the process of discovering the genes responsible for their production."
"We've known for a long time that plants make a wealth of useful, valuable compounds, but this work really throws open that treasure chest in important new ways," said Clifford Weil, a program director in the National Science Foundation's Plant Genome Research Program, which funded the research. "It's a great advance in how, and how well, we can explore nature's most-creative biofactories."
This project also highlights the benefit of interdisciplinary research.
"Our team of biologists and computational scientists worked together to answer questions that cannot be solved by each discipline alone," Shiu said. "Different knowledge, ideas and cultures clash, cross-fertilize and lead to exciting new discoveries."