Keep looking at the negative side: improved detection of drug-induced liver injury with non-hepatotoxicant data oversampling

16 April 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Drug-induced liver injury (DILI) presents a critical challenge in drug development, often leading to the withdrawal of promising therapeutic candidates. Traditional predictive models for DILI, typically relying on molecular descriptors and pharmacokinetic properties, are insufficient due to the complex and multifactorial nature of liver toxicity. This complexity stems from overlapping biological stress responses activated by both hepatotoxic and non-hepatotoxic compounds, making it difficult to distinguish between them accurately. Additionally, the scarcity of DILI-positive compounds in available datasets results in significant class imbalance, further limiting the efficacy of conventional predictive models. Addressing these challenges requires novel approaches incorporating molecular and bioactivity data to enhance predictive power. In this study, we developed a custom oversampling strategy tailored to handle DILI's biological complexity and class imbalance. We integrated stress pathway activations, particularly focusing on the oxidative, unfolded protein, DNA damage, heat shock, and cytokine signalling stress responses, with molecular descriptors and bioactivity profiles to improve model performance. The custom oversampling technique demonstrated improved specificity and overall predictive accuracy, mitigating the effects of class imbalance without overfitting. Despite these advances, significant challenges remain in refining predictive models, particularly in identifying the most informative biological markers and optimising experimental protocols for better data acquisition. Our results suggest that while incorporating diverse data types and novel oversampling strategies improves DILI prediction, further efforts are required to create robust, generalisable models capable of reliably predicting hepatotoxicity in the drug development process.

Keywords

Drug-induced liver injury (DILI)
hepatotoxicity prediction
high content screening
stress pathway biomarkers
class imbalance
molecular descriptors
bioactivity spectra
predictive modelling
custom oversampling

Supplementary materials

Title
Description
Actions
Title
Supplementary File 1
Description
File containing the summarised high-content-screening data, molecular descriptors, and labels this manuscript is based upon.
Actions
Title
Supplementary Tables
Description
Supplementary Tables referred to in the manuscript.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.