Comparative Analysis of Reinforcement Learning Algorithms for Finding Reaction Pathways: Insights from a Large Benchmark Dataset

30 December 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The identification of kinetically feasible reaction pathways that connect a reactant to its product, including numerous intermediates and transition states, is crucial for predicting chemical reactions and elucidating reaction mechanisms. However, as molecular systems become increasingly complex or larger, the number of local minimum structures and transition states grows, which makes this task challenging, even with advanced computational approaches. We introduced a reinforcement learning algorithm to efficiently identify a kinetically feasible reaction pathway between a given local minimum structure for the reactant and a given one for the product, starting from the reactant. The performance of the algorithm was validated using a benchmark dataset of large-scale chemical reaction path networks. Several search policies were proposed, using metrics based on energetic or structural similarity to the product’s goal structure, for each local minimum structure candidate found during the search. The performances of baseline greedy, random, and uniform search policies varied substantially depending on the system. In contrast, exploration-exploitation balanced policies such as Thompson sampling, probability of improvement, and expected improvement consistently demonstrated stable and high performance. Furthermore, we characterized the search mechanisms that depend on different policies in detail. This study also addressed potential avenues for further research, such as hierarchical reinforcement learning and multi-objective optimization, which could deepen the problem setting explored in this study.

Keywords

Reinforcement Learning Algorithms
Reaction Pathways

Supplementary materials

Title
Description
Actions
Title
Supporting Information for ”Comparative Analysis of Reinforcement Learning Algorithms for Finding Reaction Pathways: Insights from a Large Benchmark Dataset”
Description
Supporting information is available for the results of structural-similarity-based search and energy-difference-based search, applied to both the Passerini reaction and Strecker reaction datasets: Average of ϵ hyperparameter, used in the duplicate-path criteria, as a function of iteration step with different node selection policies (Figure S1), and the goal finding rates as a function of iteration step, determined using UCB variants possessing several C parameters (Figure S2).
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.