Skip to content

Research at St Andrews

Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules

Research output: Contribution to journalArticlepeer-review

DOI

Open Access permissions

Open

Standard

Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules. / McDonagh, James; Nath, Neetika; De Ferrari, Luna; van Mourik, Tanja; Mitchell, John B. O.

In: Journal of Chemical Information and Modeling, Vol. 54, No. 3, 24.02.2014, p. 844-856.

Research output: Contribution to journalArticlepeer-review

Harvard

McDonagh, J, Nath, N, De Ferrari, L, van Mourik, T & Mitchell, JBO 2014, 'Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules', Journal of Chemical Information and Modeling, vol. 54, no. 3, pp. 844-856. https://doi.org/10.1021/ci4005805

APA

McDonagh, J., Nath, N., De Ferrari, L., van Mourik, T., & Mitchell, J. B. O. (2014). Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules. Journal of Chemical Information and Modeling, 54(3), 844-856. https://doi.org/10.1021/ci4005805

Vancouver

McDonagh J, Nath N, De Ferrari L, van Mourik T, Mitchell JBO. Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules. Journal of Chemical Information and Modeling. 2014 Feb 24;54(3):844-856. https://doi.org/10.1021/ci4005805

Author

McDonagh, James ; Nath, Neetika ; De Ferrari, Luna ; van Mourik, Tanja ; Mitchell, John B. O. / Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules. In: Journal of Chemical Information and Modeling. 2014 ; Vol. 54, No. 3. pp. 844-856.

Bibtex - Download

@article{6806732435dd4bfca02824c2dc193c00,
title = "Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules",
abstract = "We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ~1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units.",
keywords = "Cheminformatics, Chemical theory, Druglike molecules, Quantitative structure–property relationship (QSPR) models, machine learning models , Chemistry Development Kit (CDK) descriptors",
author = "James McDonagh and Neetika Nath and {De Ferrari}, Luna and {van Mourik}, Tanja and Mitchell, {John B. O.}",
year = "2014",
month = feb,
day = "24",
doi = "10.1021/ci4005805",
language = "English",
volume = "54",
pages = "844--856",
journal = "Journal of Chemical Information and Modeling",
issn = "1549-9596",
publisher = "American Chemical Society",
number = "3",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules

AU - McDonagh, James

AU - Nath, Neetika

AU - De Ferrari, Luna

AU - van Mourik, Tanja

AU - Mitchell, John B. O.

PY - 2014/2/24

Y1 - 2014/2/24

N2 - We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ~1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units.

AB - We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ~1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units.

KW - Cheminformatics

KW - Chemical theory

KW - Druglike molecules

KW - Quantitative structure–property relationship (QSPR) models

KW - machine learning models

KW - Chemistry Development Kit (CDK) descriptors

U2 - 10.1021/ci4005805

DO - 10.1021/ci4005805

M3 - Article

VL - 54

SP - 844

EP - 856

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

SN - 1549-9596

IS - 3

ER -

Related by author

  1. Are the sublimation thermodynamics of organic molecules predictable?

    McDonagh, J. L., Palmer, D. S., van Mourik, T. & Mitchell, J. B. O., 28 Nov 2016, In: Journal of Chemical Information and Modeling. 56, 11, p. 2162-2179

    Research output: Contribution to journalArticlepeer-review

  2. A review of methods for the calculation of solution free energies and the modelling of systems in solution

    Skyner, R. E., McDonagh, J. L., Groom, C. R., van Mourik, T. & Mitchell, J. B. O., 17 Mar 2015, In: Physical Chemistry Chemical Physics. 17, 9, p. 6174-6191

    Research output: Contribution to journalArticlepeer-review

  3. First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules

    Palmer, D. S., McDonagh, J. L., Mitchell, J. B. O., van Mourik, T. & Fedorov, M. V., Sep 2012, In: Journal of Chemical Theory and Computation. 8, 9, p. 3322-3337 16 p.

    Research output: Contribution to journalArticlepeer-review

Related by journal

  1. Are the sublimation thermodynamics of organic molecules predictable?

    McDonagh, J. L., Palmer, D. S., van Mourik, T. & Mitchell, J. B. O., 28 Nov 2016, In: Journal of Chemical Information and Modeling. 56, 11, p. 2162-2179

    Research output: Contribution to journalArticlepeer-review

  2. Computational comparison of imidazoline association with the 12 binding site in human monoamine oxidases

    Basile, L., Pappalardo, M., Guccione, S., Milardi, D. & Ramsay, R. R., Apr 2014, In: Journal of Chemical Information and Modeling. 54, 4, p. 1200-1207 8 p.

    Research output: Contribution to journalArticlepeer-review

  3. Erratum: "in silico target predictions: Defining a benchmarking data set and comparison of performance of the multiclass naïve bayes and parzen-rosenblatt window"

    Koutsoukas, A., Lowe, R., Kalantarmotamedi, Y., Mussa, H. Y., Klaffke, W., Mitchell, J. B. O., Glen, R. C. & Bender, A., 28 Jul 2014, In: Journal of Chemical Information and Modeling. 54, 7, p. 2180-2182 3 p.

    Research output: Contribution to journalComment/debatepeer-review

  4. Erratum: Computational comparison of imidazoline association with the i2 binding site in human monoamine oxidases (Journal of Chemical Information and Modeling (2014) 54:4 (1200-1207))

    Basile, L., Pappalardo, M., Guccione, S., Milardi, D. & Ramsay, R. R., 28 Jul 2014, In: Journal of Chemical Information and Modeling. 54, 7, 1 p.

    Research output: Contribution to journalComment/debatepeer-review

ID: 102866887

Top