Intelligent Molecular Identification for High Performance Organosulfide Capture Using Active Machine Learning Algorithm

26 November 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning and computer-aided approaches significantly accelerate molecular design and discovery in scientific and industrial fields increasingly relying on data science for efficiency. The typical method used is supervised learning which needs huge datasets. Semi-supervised machine learning approaches are effective to train unlabeled data with improved modeling performance, whereas they are limited by the accumulation of prediction errors. Here, to screen solvents for removal of methyl mercaptan, a type of organosulfur impurities in natural gas, we constructed a computational framework by integrating molecular similarity search and active learning methods, namely, molecular active selection machine learning (MASML). This new model framework identifies the optimal molecules set by molecular similarity search and iterative addition to the training dataset. Among all 126,068 compounds in the initial dataset, 3 molecules were identified to be promising for methyl mercaptan (MeSH) capture, including benzylamine (BZA), p-methoxybenzylamine (PZM), and N,N-diethyltrimethylenediamine (DEAPA). Further experiments confirmed the effectiveness of our modeling framework in efficient molecular design and identification for capturing methyl mercaptan, in which DEAPA presents a Henry's law constant 89.4% lower than that of methyl diethanolamine (MDEA).

Supplementary materials

Title
Description
Actions
Title
Supporting Information - Intelligent Molecular Identification for High Performance Organosulfide Capture Using Active Machine Learning Algorithm
Description
1. Methods 2. Tables 3. Figures 4. References
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.