"Amide - amine + alcohol = carboxylic acid." Chemical reactions as linear algebraic analogies in graph neural networks.

19 July 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

In deep learning methods, especially in the context of chemistry, there is an increasing urgency to uncover the hidden learning mechanisms often dubbed as ``black box." In this work, we show that graph models built on computational chemical data behave similar to natural language processing (NLP) models built on text data. Crucially, we show that atom-embeddings, a.k.a atom-parsed graph neural activation patterns, exhibit arithmetic properties that represent valid reaction formulas. This is very similar to how word-embeddings can be combined to make word analogies, thus preserving the semantic meaning behind the words, as in the famous example ``King" $-$ ``Man" $+$ ``Woman" = ``Queen." For instance, we show how the reaction from an alcohol to a carbonyl is represented by a constant vector in the embedding space, implicitly representing ``-$\text{H}_{2}$," independent from the particular carbonyl reactant and alcohol product. This reveals a highly-structured vector space, wherein the directions in the embedding space are synonymous with chemical changes (ex. the oxidation direction), and distinct chemical changes are orthogonal. In contrast to natural language processing, we can explain the observed chemical analogies using algebraic manipulations on the local chemical composition that surrounds each atom-embedding. Furthermore, the observations find applications in transfer learning, for instance in the formal structure and prediction of atomistic properties, such as $^{1}$H-NMR and $^{13}$C-NMR. This work is in line with the recent push for interpretable explanations to graph neural network modeling of chemistry and uncovers a latent model of chemistry that is highly structured, consistent, and analogous to chemical syntax.

Keywords

Artificial Intelligence
Neural Networks
Graph Neural Networks
Deep Learning
Computational Chemistry
Reaction Chemistry
Natural Language Processing
Electronic Structure
Chemical Properties
Latent Model
Explainable AI
Functional Groups
Organic Chemistry
Organic Synthesis
Organic Reactions
Transfer Learning
Machine Learning
Interpretable AI
Nuclear Magnetic Resonance
Density Functional Theory
Linear Analogies

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.