Artificial Intelligence will map the chemical space to help navigate through the wide diversity of chemical compounds
Scientists from the Skoltech Center for Computational and Data-Intensive Science and Engineering (CDISE) and Helmholtz Munich Center for Environmental Health (HMGU, Germany) have created a neural network for visualizing the chemical space of compounds that can be of potential value for the pharmaceutical industry. The new method will help to create new chemical compounds and navigate in the space of the existing chemicals. The results of the study were published in RSC Advances.
Chemists often have to toil through huge databases containing tens or even hundreds of thousands of chemical structures to select the best candidates. To do so, they need to know what classes of compounds the database contains. However, going through thousands of molecules is a laborious task, which would be much easier if the molecules were pictured as dots and placed on a plane or in space, with similar molecules huddled together. This would enable studying the chemical space using a simple tool in much the same way as the geographer uses digital maps of different scales to view a bigger picture or zoom in on a particular area. But here's the rub: How would the algorithm know where to place the molecules if the tool has no knowledge of chemistry?
A joint group of researchers from CDISE (Dmitry Karlov, Sergey Sosnin and Maxim Fedorov) and HMGU (Igor Tetko) applied AI methods to extract information directly from data, and coupled the deep neural network with the popular t-SNE dimension reduction method to create a neural network capable of generating a 2-D view of the compound on a plane based on the compound's multidimensional structure received as input. The new method places molecules with similar properties close to one another, so that the compounds can be grouped into classes according to their properties. The authors of the study trained their neural network on millions of compounds with known biological activity.
"We adapted the t-SNE method to enable visualizing the chemical space of compounds with pharmaceutical potential by training the deep neural network and selecting simple descriptors and a metric for calculating distances in a multidimensional space. We also showed that this approach allows saving more information as compared to other dimension reduction methods, while being on a par with PCA in terms of speed," says Skoltech researcher and the first author of the study Dmitry Karlov.
In the future, the scientists plan to develop a series of tools for chemists and pharmacists to view the arrangement of new, unexplored compounds in relation to those already studied and described in the literature. This will expedite the R&D phase in the search for new drugs.