Abstract
Two versions of Large Language ChemBERTa-2 models, pre-trained with two different methods, were fine-tuned in this work for HIV replication inhibition prediction. The best model achieved AUROC of 0.793. The changes in distributions of molecular embeddings prior to and following fine-tuning reveal models’ enhanced ability to differentiate between active and inactive HIV molecules.