Abstract
Leveraging the increasing volume of chemical reaction data can enhance synthesis planning and improve suc-
cess rates. However, machine learning applications for retrosynthesis planning and forward reaction prediction
tools depend on having readily available, high-quality data in a structured format. While some public and
licensed reaction databases are available, they frequently lack essential information about reaction condi-
tions. To address this issue and promote the principles of findable, accessible, interoperable, and reusable
(FAIR) data reporting and sharing, we introduce the Simple User-Friendly Reaction Format (SURF). SURF
standardizes the documentation of reaction data through a structured tabular format, requiring only a basic
understanding of spreadsheets. This format enables chemists to record the synthesis of molecules in a format
that is both human- and machine-readable, making it easier to share and integrate directly into machine-
learning pipelines. SURF files are designed to be interoperable, easily imported into relational databases, and
convertible into other formats. This complements existing initiatives like the Open Reaction Database (ORD)
and Unified Data Model (UDM). At Roche, SURF plays a crucial role in democratizing FAIR reaction data
sharing and expediting the chemical synthesis process.
Supplementary weblinks
Title
Code for SURF interoperability
Description
Example SURF reaction data files and program code for seamless interoperability of SURF files with other data formats like ORD and UDM.
Actions
View