Team develops new generation of artificial neural networks able to predict properties of organic compounds
Scientists from Russia, Estonia and the United Kingdom have created a new method for predicting the bioconcentration factor (BCF) of organic molecules. Leveraging the classical models of physicochemical interactions between the solvent and the solute and advanced machine learning methods, the new approach makes it possible to predict complex properties of a substance based on a minimum set of input data. The results of the study were published in Journal of Physics: Condensed Matter.
One of the most important characteristics of organic substances, BCF represents how much of a substance is concentrated in a tissue relative to how much of that substance exists in the environment in equilibrium conditions. BCF is widely used in assessing the safety of various chemical compounds and can be measured in practice. For example, you can place a test chemical in the fish tank, wait until equilibrium is reached, and then measure its concentration both in the fish and in the water. But what if you want to estimate the BCF based on calculations alone?
One way to do this is to generate a set of molecule parameters (descriptors) and build a mathematical model based on these inputs. The model can turn out quite accurate, but it may be difficult to interpret due to a great number of parameters. And worse still, the model may not work properly for compounds differing strongly from those in the training set.
The second method is based on the molecular theory of liquids that describes the behavior of substances in solutions. However, bioconcentration is a complex parameter that depends on a variety of factors, so it can hardly be predicted by directly applying physicochemical theory.
Scientists from Skoltech, the University of Tartu (Estonia) and the University of Strathclyde (UK), led by Skoltech Professor Maxim Fedorov, developed a hybrid BCF prediction method that consists of two steps: first the researchers make physical-chemical calculations to obtain 3-D densities of hydrogen and oxygen around the molecule under study and then apply 3-D convolutional neural networks ‒ a technology successfully used in image recognition. This approach demonstrates that the complex properties of organic substances can be described even with a small amount of input data.
"Our method will make it much easier to predict the environmental impact of a given substance. But the most important thing is that we have developed a universal method of describing a molecule in such a way that its 3-D image can be transferred to a 3-D convolutional neural network. In the long term, our method will help to predict the properties of various 'exotic' molecules and novel compounds where the existing structure-property relationship methods do not work," said first author and Skoltech Ph.D. student Sergey Sosnin.