Abstract
Modern QSAR approaches
have wide practical applications in drug discovery for screening potentially
bioactive molecules before their experimental testing. Most models predicting the
bioactivity of compounds are based on molecular descriptors derived from 2D
structure losing explicit information about the spatial structure of molecules
which is important for protein-ligand recognition. The major problem in
constructing models using 3D descriptors is the choice of a probable bioactive conformation
that affects the predictive performance. Multi-instance (MI) learning approach
considering multiple conformations upon the model training can be a reasonable
solution to the above problem. Here, we compared MI-QSAR with the classical
single-instance QSAR (SI-QSAR) approach, where each molecule was encoded by
either 2D descriptors or 3D descriptors issued from the single lowest-energy conformation.
The calculations were carried out on a sample of 175 datasets extracted from
the ChEMBL23 database. It was demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically
identify plausible bioactive conformations. Instance-attention based network can be applied for most important conformer selection which was shown to correspond PDB conformer in 50-84% of molecules.