Identifying Potential Missteps of Machine Learning in Molecular Chemistry

07 March 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning-based methods are widely used today in chemical tasks, particularly in drug design. Graph Convolutional Neural Networks (GCNNs) compete with one another in predicting chemical properties, achieving errors comparable with those of experimental measurements. However, the increasing complexity of data entry structures and the trend toward utilizing three-dimensional molecular geometries are rarely grounded in a thorough search for accurate conformations for input. In this study, we examined the stability of the state-of-the-art GCNN architecture for drug discovery and identified vulnerabilities related to the structural features of the compounds. We found that molecular weight significantly influenced the discrepancy between predicted and calculated HOMO-LUMO gap values. We demonstrated that high similarity between new molecules and the training dataset, as measured by Tanimoto indices, did not lead to a qualitative prediction of the model. In contrast, more dissimilar structures require adding less information to the training set for a successful active learning procedure.

Keywords

Drug design
Graph Convolutional Neural Networks
Molecular similarity
adversarial attacks
Vulnerability

Supplementary materials

Title
Description
Actions
Title
Supplementary information for the paper Identifying Potential Missteps of Machine Learning in Molecular Chemistry
Description
Supplementary information for the paper Identifying Potential Missteps of Machine Learning in Molecular Chemistry with additional figures.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.