Open Macromolecular Genome: Generative Design of Synthetically Accessible Polymers

24 January 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

A grand challenge in polymer science lies in the predictive design of new polymeric materials with targeted functionality. However, de novo design of functional polymers is challenging due to the vast chemical space and an incomplete understanding of structure-property relations. Recent advances in deep generative modeling have facilitated the efficient exploration of molecular design space, but data sparsity in polymer science is a major obstacle hindering progress. In this work, we introduce a vast polymer database known as the Open Macromolecular Genome (OMG), which contains synthesizable polymer chemistries compatible with known polymerization reactions and commercially available reactants selected for synthetic feasibility. The OMG is used in concert with a synthetically aware generative model known as Molecule Chef to identify property-optimized constitutional repeating units, constituent reactants, and reaction pathways of polymers, thereby advancing polymer design into the realm of synthetic relevance. As a proof-of-principle demonstration, we show that polymers with targeted octanol-water solubilities are readily generated together with monomer reactant building blocks and associated polymerization reactions. Suggested reactants are further integrated with Reaxys polymerization data to provide hypothetical reaction conditions (e.g. temperature, catalysts, solvents). Broadly, the OMG is a polymer design approach capable of enabling data intensive generative models for synthetic polymer design. Overall, this work represents a significant advance enabling the property targeted design of synthetic polymers subject to practical synthetic constraints.

Supplementary materials

Title
Description
Actions
Title
Supplementary Materials
Description
OMG Reactant Composition and Downselection, OMG CRU Composition, LogP Normalization of CRUs, Generative Model Hyperparameters, Training Data Compositions, Explained Variance of PolyInfo Reactants and PolyInfo CRUs from PCA, Training Results of Generative Models, Gaussian Prior Generation.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.