Improved prediction of organic-molecule thermodynamic parameters with surrogate modeling

solventAccurate calculation of the solvation parameters of organic molecules, important in many aspects of research in the pharmaceutical and agrochemical industries, is a longstanding challenge in computational chemistry. For example, many of the pharmacokinetic properties of potential drug molecules are defined by their solvation and acid-base behavior, which can be estimated from their hydration free energies. Solvation, also sometimes called dissolution, is the process of attraction and association of molecules of a solvent with molecules or ions of a solute; as ions dissolve in a solvent, they spread out and become surrounded by solvent molecules.

Challenge: Increase prediction quality of two thermodynamic parameters – hydration free energy and logarithm of the octanol-water partition coefficient—Existing computational methods for predicting molecular solvation parameters fall into either of two general classes: bottom-up or top-down. Bottom-up methods use a molecular-scale physical-chemical model to describe the process of molecular solvation at some level of approximation. Top-down methods, by contrast, are based on statistical analysis of quantitative structure-property relationships (QSPR) and make no a priori assumptions about physical-chemical phenomena beyond those already established.

Both strategies have advantages and disadvantages. Molecular modeling methods offer useful insights into the mechanisms of molecular solvation and, on average, are more accurate than QSPR methods. However, at the same time they are much more computationally expensive and, due to the large costs associated with modelling large, complex molecular systems, are justified for only a limited number of applications in large-scale computational screening of molecular databases.

QSPR methods, on the other hand, offer computationally inexpensive ways to predict molecular solvation parameters. Using modern methods of statistical analysis, in principle they could be applied to establish complex nonlinear relationships in large complex molecular (bio) systems. However, there is no proper physical-chemical solvation model behind these methods. Therefore, often it is difficult to interpret results obtained by these methods. Also, QSPR-derived models are frequently sensitive to the composition of the training and test sets. Often, parameters of a QSPR model that describe properties of molecules from one chemical class well are not transferable to other chemical classes.

Approach: Use existing MACROS techniques to build a surrogate model accurate enough for a wide range of different molecules—For the University of Strathclyde, Glasgow, the goals of this study were to generalize a previously proposed approach based on linear regression, using smart MACROS surrogate modeling techniques; develop its statistical analysis techniques; and expand its area of applications. To achieve these goals, the research team carried out an investigation of the performance of different methods of statistical analysis on the quality of predictions.

They focused on prediction of two key thermodynamic parameters: hydration free energy and logarithm of the octanol-water partition coefficient (for normal pH). These parameters are of fundamental interest in several areas of solution chemistry, pharmacology and environmental sciences.

Glasgow end

Benefit: Significant increase in surrogate model quality, enabling reduced number of lengthy tests—Using DATADVANCE’s MACROS software, the team obtained a surrogate model that proved to provide better prediction quality than current industry-standard approaches. The model preserves its predictive power across a wide range of molecules. Furthermore, time to obtain one accurate prediction of thermodynamic parameter was reduced from one month (time to carry out an experiment) to several hours.