Abstract
Rapid generation and evaluation of diverse synthesis pathways play a critical role in exploring a
broader chemical space and identifying potent drug candidates. Drug discovery often relies on laborintensive
manual processes for retro synthetic route finding, resulting in challenges related to scalability
and reproducibility. Autonomous chemical synthesis platforms, like ASPIRE aim to address this bottleneck
by the development of high-throughput synthesis capabilities. While AI/ML-based predictive methods
exist that can generate synthesis routes rapidly, evidence based synthesis route search, often relying
on knowledge graphs, poses its own challenges for scalability. In this study, we present a comprehensive
benchmarking framework and analysis employed on the ASPIRE Integrated Computational Platform
(AICP), that led to a breakthrough in the light of high-throughput synthesis planning. Our strategy encompasses
query optimization and domain-driven data engineering techniques, which worked in accord
to reduce the synthesis route finding time by orders of magnitude. As a result, AICP is equipped with
a high-throughput, evidence-based computer assisted synthesis planning method that has the ability to
automatically identify viable synthesis routes to 2000 target molecules within approximately 40 minutes.
Complementing existing retrosynthetic approaches, with the use of knowledge graph of 1.2M chemical
reactions, AICP represents a significant advancement towards automating high-throughput synthesis in
drug discovery, thus paving the way for more efficient drug candidate identification and development.
Supplementary materials
Title
Supplementary Information - Path Toward High-Throughput Synthesis Planning via Performance Benchmarking
Description
Supplementary Information for the manuscript "Path Toward High-Throughput Synthesis Planning via Performance Benchmarking"
Actions
Supplementary weblinks
Title
Dataset Associated with the Manuscript "Path Toward High-Throughput Synthesis Planning via Performance Benchmarking"
Description
Input data, and output data resulted from the benchmarking experiments. Input data includes various configurations of the reaction knowledge graph of the ASPIRE Integrated Computational Platform (AICP). The AICP reaction knowledge graph was derived from the USPTO and SAVI reaction datasets.
Actions
View