Learning Machine Reasoning for Bioactivity Prediction of Chemicals

08 May 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We describe a method for learning higher-level vector representations of interactions between molecular features and biology. We named the representations as the reason vectors. In contrast to the high-dimensional chemical fingerprints, reason vectors are much simpler with only about 5 dimensions. They allow abstract reasoning for bioactivity of chemicals or absence thereof, uncover causal factors in interactions between chemical features and generalize beyond specific chemical classes or bioactivity. These qualities enable us to perform powerful similarity searches that are vague and conceptual in nature. The methodology can handle novel combinations of features in query molecules and can evaluate chemical classes that are entirely absent in training data. The method consists of similarity-based near neighbor search on a reference database of biologically tested chemicals by a series of substructures obtained from stepwise reconstruction of the test molecule. A data-driven continuous representation of molecular fragments was used for molecular similarity computations. The technique was inspired by the ability of humans to learn and generalize complex concepts by interacting with the physical world. We also show that activity prediction of chemicals using the abstract reason vectors is very easy and straightforward, as compared to modeling in the raw chemistry space, and can be applied to both binary and continuous activity outcomes. Except for utilizing an unsupervised training to construct continuous molecular fingerprints, the methodology is devoid of gradient optimization or statistical fitting.

Keywords

reason vectors
QSAR
machine reasoning
continuous representation
Artificial Intelligence
Chemical fingerprints
abstract representations
drug discovery
computational toxicology

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.