Abstract
Molecular structure elucidation is a crucial but fundamentally challenging step in the characterization of materials given the large number of possible structures. Here, we introduce Spectro, an innovative multi-modal approach for molecular elucidation that combines $\CNMR$ and $\HNMR$ NMR data with IR. Spectro translates the embedded representations of the spectra into molecular structures using the SELFIES notation. We employed a vision model for the embedded representation of the IR data, which was pretrained to detect relevant functional group peaks in the IR spectra achieving an F1 score of 91\%. For NMR data, we utilized LLM2Vec, treating the NMR spectra as text. This integration of multiple spectroscopic techniques allows Spectro to achieve an overall test accuracy of 93\% when trained jointly with the vision model for the IR spectra, and 82\% when trained with fixed embeddings. Our approach demonstrates the potential of multi-modal learning in tackling complex molecular characterization tasks.