A Subgraph Isomorphic Decision Tree to Predict Radical Thermochemistry with Bounded Uncertainty Estimation

31 January 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Detailed chemical kinetic models offer valuable mechanistic insights into industrial applications. Automatic generation of a reliable kinetic model requires fast and accurate radical thermochemistry estimation. Kineticists often prefer hydrogen bond increment (HBI) corrections from a closed shell molecule to the corresponding radical for their interpretability, physical meaning, and facilitation of error cancellation as a relative quantity. Tree estimators, used due to limited data, rely on expert knowledge and manual construction currently, posing challenges in maintenance and improvement. In this work, we extend the subgraph isomorphic decision tree (SIDT) algorithm originally developed for rate estimation, to estimate HBI corrections. We introduce a physics-aware splitting criterion, explore a bounded weighted uncertainty estimation method, and evaluate aleatoric uncertainty-based and model variance reduction-based pre-pruning methods. Moreover, we compile a dataset of thermochemical parameters for 2,210 radicals involving C, O, and H based on quantum chemical calculations from recently published works. We leverage the collected dataset to train the SIDT model. Compared to existing empirical tree estimators, the SIDT model (1) offers an automatic approach to generating and extending tree estimator for thermochemistry, (2) has better accuracy and R2, (3) provides significantly more realistic uncertainty estimates, and (4) has a tree structure much more advantageous in descent speed. Overall, the SIDT estimator marks a great leap in kinetic modeling, offering more precise, reliable, and scalable predictions for radical thermochemistry.

Keywords

chemical kinetics
machine learning
decision tree

Supplementary materials

Title
Description
Actions
Title
Supporting information
Description
Details on dataset and additional results and analysis
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.