Abstract
Molecular fingerprints are essential for different cheminformatics approaches like similarity-based virtual screening. In this work, the concept of neural (network) fingerprints in the context of similarity search is introduced in which the activation of the last hidden layer of a trained neural network represents the molecular fingerprint. The neural fingerprint performance of five different neural network architectures was analyzed and compared to the well-established Extended Connectivity Fingerprint (ECFP) and an autoencoder-based fingerprint. This is done using a published compound dataset with known bioactivity on 160 different kinase targets. We expect neural networks to combine information about the molecular space of
already known bioactive compounds together with the information on the molecular structure of the query and by doing so enrich the fingerprint. The results show that indeed neural fingerprints can greatly improve the performance of similarity searches. Most importantly, it could be shown that the neural fingerprint performs well even for kinase targets that were not included in the training. Surprisingly, while Graph Neural Networks (GNNs) are thought to offer an advantageous alternative, the best performing neural fingerprints were based on traditional fully connected layers using the ECFP4 as input. The best performing kinase-specific neural fingerprint will be provided for public use.
already known bioactive compounds together with the information on the molecular structure of the query and by doing so enrich the fingerprint. The results show that indeed neural fingerprints can greatly improve the performance of similarity searches. Most importantly, it could be shown that the neural fingerprint performs well even for kinase targets that were not included in the training. Surprisingly, while Graph Neural Networks (GNNs) are thought to offer an advantageous alternative, the best performing neural fingerprints were based on traditional fully connected layers using the ECFP4 as input. The best performing kinase-specific neural fingerprint will be provided for public use.