Abstract
The architectural, compositional, and chemical complexities of polymers are fundamentally important to their properties; however, these same factors obfuscate effective predictions. Machine learning offers a promising approach for predicting polymer properties, but model transferability remains a major challenge, particularly when data is scarce due to high acquisition costs or the growth of the parameter space. Here, we examine whether integration with polymer physics theory effectively enhances the transferability of machine learning models to predict properties of architecturally and compositionally diverse polymers. To do so, we first generate ToPoRg-18k--a dataset reporting the moments of the distribution of squared radius of gyration for 18,450 polymers with diverse architectures, molecular weights, compositions, and chemical patterns. We then systematically assess the performance of several different models on a series of transferability tasks, such as predicting properties of high molecular weight systems from smaller ones or predicting properties of copolymers from homopolymers. We find that a tandem model, GC-GNN, which combines a graph neural network with a fittable model based on ideal Gaussian chain theory, surpasses both standalone polymer-physics and graph neural network models in predictive accuracy and transferability. We also demonstrate that predictive transferability varies with polymer architecture due to deviations from the ideal Gaussian chain assumption. Furthermore, the integration with theory endows GC-GNN with additional interpretability, as its learned coefficients correlate strongly with polymer solvophobicity. Overall, this study illustrates the utility of combining polymer physics with data-driven models to improve predictive transferability for architecturally diverse copolymers, showcasing an extension of physics-informed machine learning for macromolecules.
Supplementary materials
Title
Supplemental Information
Description
Baseline model derivation; mean and standard deviation of squared radii of gyration; simulated and theoretical mean and standard deviation of squared radius of gyration; standard deviation transferability across architecture classes; model interpretability; data collection and decorrelation time for squared radius of gyration.
Actions
Supplementary weblinks
Title
ToPoRg-18k: dataset of single-chain radii of gyration distribution for 18,450 architecturally diverse and chemically patterned coarse-grained polymers
Description
This distribution provides access to 18,450 configurations of coarse-grained polymers. The data is provided as a serialized object using the `pickle' Python module and in csv format. The data was compiled using Python version 3.8.
Actions
View