Abstract
Drug-induced liver injury (DILI) presents a critical challenge in drug development, often leading to the withdrawal of promising therapeutic candidates. Traditional predictive models for DILI, typically relying on molecular descriptors and pharmacokinetic properties, are insufficient due to the complex and multifactorial nature of liver toxicity. This complexity stems from overlapping biological stress responses activated by both hepatotoxic and non-hepatotoxic compounds, making it difficult to distinguish between them accurately. Additionally, the scarcity of DILI-positive compounds in available datasets results in significant class imbalance, further limiting the efficacy of conventional predictive models. Addressing these challenges requires novel approaches incorporating molecular and bioactivity data to enhance predictive power. In this study, we developed a custom oversampling strategy tailored to handle DILI's biological complexity and class imbalance. We integrated stress pathway activations, particularly focusing on the oxidative, unfolded protein, DNA damage, heat shock, and cytokine signalling stress responses, with molecular descriptors and bioactivity profiles to improve model performance. The custom oversampling technique demonstrated improved specificity and overall predictive accuracy, mitigating the effects of class imbalance without overfitting. Despite these advances, significant challenges remain in refining predictive models, particularly in identifying the most informative biological markers and optimising experimental protocols for better data acquisition. Our results suggest that while incorporating diverse data types and novel oversampling strategies improves DILI prediction, further efforts are required to create robust, generalisable models capable of reliably predicting hepatotoxicity in the drug development process.
Supplementary materials
Title
Supplementary File 1
Description
File containing the summarised high-content-screening data, molecular descriptors, and labels this manuscript is based upon.
Actions
Title
Supplementary Tables
Description
Supplementary Tables referred to in the manuscript.
Actions
Supplementary weblinks
Title
Data and results archive
Description
Zenodo deposition of the data required to obtain the results presented in the manuscript, together with the obtained results.
Actions
View Title
Python code used to obtain the results from the archived data
Description
Link to the GitHub repository hosting the code that generated the results presented in the manuscript.
Actions
View