Learning Advance: Robotics-LLM Guided Hypotheses Generation for the Discovery of Chemical Knowledge

02 April 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We present a novel framework that we name "Learning Advance" for hypothesis generation and validation for the discovery of chemical knowledge in the context of optimizing solubility in amphiphile/water systems. The workflow begins with an initial hypothesis: that the incorporation of common hydrotropic additives, such as sugars or urea, enhances solubility limits. To test this assumption, we employ a grid search and Latin hypercube sampling approach to design experimental combinations of additive weight percentages. We employ high-throughput robotic systems for automating the experiments and a YOLO-based image analysis workflow for determining the degree of solubilization. Experimental data are transformed into a chemical feature space to train a Gaussian Process Regression (GPR) model, which drives a Bayesian optimization (BO) algorithm for identifying optimal additive combinations. When BO plateaus, the "Learning Advance" approach leverages all accumulated data for AI analysis. We extract correlations between target property and chemical features, enabling LLM tools to generate a novel hypothesis based on the observed data. This hypothesis is subsequently validated through experimentation, creating a continuous cycle of discovery. This framework demonstrates how integrating BO with AI-driven hypothesis generation enables breakthroughs beyond conventional optimization limits, establishing a promising approach for advancing scientific knowledge discovery in material science and chemistry.

Supplementary materials

Title
Description
Actions
Title
Supplementary Materials
Description
SI
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.