Abstract
Molecular string representations are a key asset in cheminformatics and
are becoming increasingly relevant to the general chemical community,
due to the steadily growing impact of Big Data and Machine Learning.
Among all of the existing string representations that have been
proposed, SMILES (Simplified Molecular Input Line Entry Specification)
are probably the de facto standard as of today. Despite their
convenience as a way to store unique molecular structures in data-bases,
however, SMILES are not easy to understand for most chemists: that is,
it is difficult for an untrained chemist to grasp the molecule that a
SMILES is describing.
To mitigate this, we propose the HumanSMILES
algorithm: a simple pro-cedure that can translate a SMILES string into a
more interpretable name, inspired by common abbreviations and names
employed in general organic chemistry. The Human-Readable SMILES can
describe linear structures and general non-fused cyclic structures, with
a set of naming rules that combines automated processing and chemical
knowledge. The code is available open-source, as well as a web application.
Supplementary materials
Title
HumanSMILES SI
Description
Actions