Automatic Annotation of Sites of Metabolism from Biotransformation Data

02 April 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Computational models predicting the sites of metabolism (SOM) of small organic molecules have become invaluable tools for studying and optimizing the metabolic properties of xenobiotics. However, the performance of SOM predictors has shown signs of plateauing in recent years, primarily due to the limited availability of training data. While vast amounts of biotransformation data in the form of substrate-metabolite pairs exist, their potential for SOM prediction remains largely untapped due to the absence of annotations. Annotating SOMs requires expert knowledge and is highly time-consuming. To address this challenge, we introduce AutoSOM, the first open-source tool that automatically extracts SOMs by mapping structural differences using transformation rules. AutoSOM is both fast and highly accurate, achieving over 90% labeling accuracy on a diverse validation set of 5,000+ reactions within minutes. Moreover, its annotation process is fully transparent and interpretable, which we hope will facilitate its adoption in high-stakes downstream applications such as drug discovery campaigns and regulatory assessments. Beyond accelerating annotation, AutoSOM enables standardized and consistent SOM labeling across institutions without requiring direct data sharing. This capability lays the foundation for federated learning approaches in metabolism prediction, fostering collaborative model improvement while preserving data confidentiality.

Keywords

Metabolism
Sites of metabolism (SOMs)
Automatic annotation
Biotransformation
Computational metabolism prediction
Annotation of metabolites
xenobiotic metabolism
Open-source software

Supplementary materials

Title
Description
Actions
Title
Supporting information file 1
Description
PDF document detailing the software used in this work, the composition of the data set, the data preprocessing steps, and a discussion of representative examples of annotated substrate-metabolite pairs
Actions
Title
Supporting information file 2
Description
CSV file containing the complete list of evaluated MetaTrans substrate-metabolite pairs
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.