Abstract
Models to predict the fate of micropollutants are increasingly needed for alternatives assessment and safe-by-design efforts. In this study, we focus on predicting transformation in wastewater treatment plants (WWTPs), as they are considered to be the main barrier to prevent micropollutants from entering receiving water bodies. STPWIN, the state-of-the-art model to predict removal of organic substances in WWTPs, requires first-order degradation rate constants, which are available for a very limited number of chemicals, representing a major limitation. In this study, we propose a workaround by using data from field-scale monitoring to train structure-activity models that afford predicting removals in conventional treatment directly from chemical structure. This strategy is only possible due to advancements in high resolution mass spectrometry and machine learning, which has allowed us to build and validate over 40 different machine learning models using data for over 1000 chemicals in over 50 WWTPs. We systematically evaluated the influence of data quality on model performance and concluded that substances, with very high variability in removal across different WWTPs, were detrimental if given high importance during training. The best predictions were achieved using substructure-based fingerprints (i.e., MACCS) and random forests. These predictions proved more reliable than existing process-based models that are widely used in EU and US regulatory contexts, especially for molecules where no experimental biotransformation kinetic data are available. This suggests that our model could be an important and novel contribution to the toolbox of in silico models used for alternatives assessment, when evaluating new molecules in industrial research and development, or even for exposure modeling in a risk assessment context. We have here established a benchmark model, which is publicly available along with the training data and the scripts necessary to reproduce the data curation process (renkulab.io/projects/fenner-labs/projects/pepper). We anticipate that this benchmark and the highly transparently curated data set that we provide will facilitate further developments in the field.
Supplementary materials
Title
Supporting Information
Description
Additional text and figures supporting the main manuscript. The different section cover a deeper discussion of some of the points raised in the main manuscript.
Actions
Supplementary weblinks
Title
PEPPER_github_repo
Description
All code and data relevant to this work is presented in this repository. We expect that this repository in addition to the Renku Project) (https://renkulab.io/projects/fenner-labs/projects/pepper) will provide enough detail for other researchers to reproduce and use our work as a benchmark.
Actions
View