Current Opportunities and Limitations in Predicting Micropollutant Removal in Wastewater Treatment based on Molecular Structure

18 April 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Models to predict the fate of micropollutants are increasingly needed for alternatives assessment and safe-by-design efforts. In this study, we focus on predicting transformation in wastewater treatment plants (WWTPs), as they are considered to be the main barrier to prevent micropollutants from entering receiving water bodies. STPWIN, the state-of-the-art model to predict removal of organic substances in WWTPs, requires first-order degradation rate constants, which are available for a very limited number of chemicals, representing a major limitation. In this study, we propose a workaround by using data from field-scale monitoring to train structure-activity models that afford predicting removals in conventional treatment directly from chemical structure. This strategy is only possible due to advancements in high resolution mass spectrometry and machine learning, which has allowed us to build and validate over 40 different machine learning models using data for over 1000 chemicals in over 50 WWTPs. We systematically evaluated the influence of data quality on model performance and concluded that substances, with very high variability in removal across different WWTPs, were detrimental if given high importance during training. The best predictions were achieved using substructure-based fingerprints (i.e., MACCS) and random forests. These predictions proved more reliable than existing process-based models that are widely used in EU and US regulatory contexts, especially for molecules where no experimental biotransformation kinetic data are available. This suggests that our model could be an important and novel contribution to the toolbox of in silico models used for alternatives assessment, when evaluating new molecules in industrial research and development, or even for exposure modeling in a risk assessment context. We have here established a benchmark model, which is publicly available along with the training data and the scripts necessary to reproduce the data curation process (renkulab.io/projects/fenner-labs/projects/pepper). We anticipate that this benchmark and the highly transparently curated data set that we provide will facilitate further developments in the field.

Keywords

Micropollutants
Wastewater
Predict Breakthrough

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Additional text and figures supporting the main manuscript. The different section cover a deeper discussion of some of the points raised in the main manuscript.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.