Abstract
Solvatochromism occurs in both homogeneous solvents and more complex biological environments such as proteins. While in both cases the solvatochromic effects report on the surroundings of the chromophore, their interpretation in proteins becomes more complicated, not only because of structural effects induced by the protein pocket, but also because the protein environment is highly anisotropic. This is particularly evident for highly conjugated and flexible molecules such as carotenoids, whose excitation energy is strongly dependent on both the geometry and the electrostatics of the environment. Here we introduce a machine learning (ML) strategy trained on QM/MM calculations of geometrical and electrochromic contributions to carotenoids' excitation energies. We employ this strategy to compare the solvatochromism in protein and solvent environments. Despite the important specifities of the protein, ML models trained on solvents can faithfully predict excitation energies in the protein environment, demonstrating the robustness of the chosen descriptors.
Supplementary materials
Title
Supporting Information: Predicting Solvatochromism of Chromophores in Proteins through QM/MM and Machine Learning
Description
Details on the semiempirical CI calculations; details of the scan over the MM charges; details on the active learning strategy; details on the definition of potential imbalance and the potential imbalance in MeOH, MeOH+q, and LHCII; performance of the linear and polynomial kernel in MeOH+q and LHCII; additional analysis of the UMAP projection of the environments; learning curves for the ML models of the solvatochromic shift.
Actions