Abstract
Gold nanoparticles (AuNPs) are widely used functional nanomaterials that exhibit adjustable properties based on their shapes and sizes. Creating a comprehensive dataset of AuNP syntheses is useful for understanding control over their shape and size. Here, we employed search-based algorithms and fine-tuned the Llama-2 large language model to extract 492 multi-sourced seed-mediated AuNP synthesis recipes from the literature. With this dataset which we share online, we verified that the seed capping agent type such as CTAB or citrate plays a crucial role in determining the morphology of the AuNPs, aligning with established findings in the field. We also observe a weak correlation between the final AuNR aspect ratio and silver concentration, although a large variance reduces the significance of this relationship. Overall, our work demonstrates the value of literature-based datasets for advancing knowledge in the field of nanomaterial synthesis for further exploration and better reproducibility.