Abstract
In this groundbreaking study, we introduce the Single-point Chemical Language Model (SpCLM), a novel and robust framework engineered to advance molecular design and optimization. By leveraging the sophisticated transformer architecture and directing our attention towards single-point molecular optimization, our model demonstrates an exceptional ability to enhance pharmacological properties in a manner that aligns with the practical wisdom of medical chemists. Through rigorous optimization protocols and the generation of a mere few hundred compounds, we have achieved structurally refined molecules that exhibit a strong correlation with experimental activity data, achieving a high degree of consistency with measured binding affinities and functional outcomes. This research emphasizes the potential of our model as a revolutionary tool in drug design, enabling precise, data-driven modifications to molecules that significantly improve the activity and selectivity of lead compounds. By employing single-point optimization strategies, SpCLM predicts 60%-80% of active compounds in an independent test set from a small pool of generated molecules, typically numbering in the hundreds. This approach significantly reduces the need for extensive experimental screening, thereby minimizing time and resource costs and setting a new standard for AI-driven advancements in pharmaceutical research and development.