'The Alexa of chemistry': Researchers on fast track to build open network
D. Tyler McQuade, Ph.D., a professor in the Virginia Commonwealth University College of Engineering, is principal investigator of a multi-university project seeking to use artificial intelligence to help scientists come up with the perfect molecule for everything from a better shampoo to coatings on advanced microchips.
The project is one of the first in the U.S. to be selected for $994,433 in funding as part of a new pilot project of the National Science Foundation called the Convergence Accelerator (C-Accel). McQuade and his collaborators will pitch their prototype in March 2020 in a bid for additional funding of up to $5 million over five years.
Adam Luxon, a Ph.D. student in the Department of Chemical and Life Science Engineering who has been involved from the beginning, explained it this way: "We want to essentially make the Alexa of chemistry."
Just as Amazon, Google and Netflix use data algorithms to suggest customized predictions, the team plans to build an open network that can combine and help users make sense of molecular sciences data pulled from a range of sources including academia, industry and government.
The idea is in line with the goal of the NSF's Big Ideas project, "Harnessing the Data Revolution," to engage the research community in developing an advanced cyberinfrastructure to accelerate data intensive research.
The team reflects expertise across several specialties. Working with McQuade are James K. Ferri, Ph.D., professor in the VCU Department of Chemical and Life Science Engineering; Carol A. Parish, Ph.D., professor of chemistry and the Floyd D. and Elisabeth S. Gottwald Chair in the Department of Chemistry at the University of Richmond; and Adrian E. Roitberg, Ph.D., professor in the Department of Chemistry at the University of Florida. Two companies are also involved: Two Six Labs, based in Arlington, Virginia, and Fathom Information Design, based in Boston.
Currently, there is no shared network or central portal where molecular scientists and engineers can harness artificial intelligence and data science tools to build models to support their needs. And while scientists have been able to depict what elements make up a molecule, how the atoms are arranged and the molecule's properties (such as its melting point), there is no standard way to represent—or predict—molecular performance.
The team aims to fill these gaps by advancing the concept of a "molecular imprint." The collaborators will create a new system that represents molecules by combining line drawing, geometry and quantum chemical calculations into a single, machine-learnable format.
They will develop a central platform for collecting data, creating these molecular imprints and developing algorithms for mining the data, and will develop machine learning tools to create performance prediction models.
"The ability to compute molecular properties using computational techniques, and to dovetail that data with experimental measurements, will generate databases that will produce the most comprehensive results in the molecular sciences," Parish said.
"There are many laboratories around the world working in this space; however, there are few organizational structures available that encourage open sharing of these data for the benefit of the community and the common good," Parish added. "We seek to collaborate with others to provide this structure; an open knowledge network or repository where scientists can deposit their molecular-level experimental and computational data in exchange for user-friendly tools to help manage and query the data."
The initial response to their idea has been strong from potential partners. Ferri and the others have already collected more than a dozen letters from major corporations such as Dow and Merck expressing interest in participating.
McQuade said chemical engineers in major industries, including consumer products and oil and gas producers, expend a lot of effort running experiments to determine the molecule they want to use, such as finding the best shampoo additive that doesn't make babies cry.
"The ability to design the properties you want is still more art than science," he said.
The team also plans to develop a toolkit for processing and visualizing the data.
Roitberg, whose research focuses include advanced visualization, said this could take the form of a virtual reality realm in which a user could find materials that are soluble in water but not oil, for instance, and then be able to browse for similar materials nearby.
"We envision a very interactive platform where the user can explore relations between data and desired material properties," he said.