Abstract
Machine learning techniques are being applied in quantifying structure-property relationships for a wide variety of materials, where the properly representing materials plays key roles. Although algorithms for representation learning are extensively studied, their applications to domain-specific areas, such as polymer, are limited largely due to the lack of benchmark databases. In this work, we investigate different types of polymer representations, including Morgan Fingerprint (MF), molecular embedding (ME) and molecular graph (MG), based on a benchmark database from a subset of PolyInfo. We evaluate the quality of different polymer representations via quantifying the relationships between the representations and polymer properties, including density, melting temperature and glass transition temperature. Different representation learning schemes, such as supervised learning, semi-supervised learning and transfer learning, are investigated. It is found that ME outperforms the other representations for structure-property relationship quantification in all cases studied, and MG is shown to be much inferior than ME and MF, likely due to the relatively small volumes of training data available. For MEs, it is found that the similarities of substructure MEs under different learning schemes (e.g., SL, SSL and TL) are differently estimated, thus leading to different performance scores in structure-property relation quantification. Several ME mixtures have shown to outperform the single MEs in the corresponding regression tasks, and this is attributed to the information gain when mixing different ME.