Abstract
Food, water, air, and soil are regularly contaminated with natural and artificially occurring forms of arsenic, of which arsonic acid derivatives RAsO(OH)2 are the major pentavalent compounds present in aqueous media. At a given pH, the resulting ionization state for these derivatives affects their lipophilicity, solubility, protein binding, and their ability to cross plasma membranes, potentially increasing their toxicity. Knowing their pKa values not only characterizes them but also helps design a strategy for bioremediation. Numerous challenges are associated with predicting pKa, and existing models are limited to specific chemical spaces. To leverage a pKa model for arsonic acids, we contrast machine learning (ML) methods based in Support Vector Machine and three DFT-based models: correlation to the maximum surface electrostatic potential (VS,max) at the ωB97XD/cc-pVTZ level of theory; correlation to carboxylate atomic charges in conjunction with a density-based solvation model (SMD) at the level of M06L/6-311G(d,p); and the scaled solvent-accessible surface approach, which yielded high mean unsigned errors for predicted pKa, and therefore it is not an efficient method for calculating the pKas of arsenic acids, in contrast with reported data for carboxylic acids, aliphatic amines, and thiols. The highest agreement was obtained with the atomic charges calculation on the conjugated arsonate base. ML-based and Vs.,max models rank second and third, respectively, in terms of prediction performance.
Supplementary materials
Title
ML descriptors
Description
Spreadsheet containing all Machine Learning descriptors and results as included in the supporting information pdf file
Actions
Title
Supporting Information
Description
Supporting information regarding Vs,max calculations, charges and DFT optimized coordinates.
Actions