Abstract
Computer simulations are foundational to theoretical chemistry. Quantum-mechanical (QM) methods provide the highest accuracy for simulating molecules but have difficulty scaling to large systems. Empirical interatomic potentials (classical force fields) are scalable, but lack transferability to new systems and are hard to systematically improve. Automated, data-driven machine learning is close to achieving the best of both approaches. Here we use transfer learning to retrain a general purpose neural network potential, ANI-1x, on a dataset of gold standard QM calculations (CCSD(T)/CBS level) that is relatively small but designed to optimally span chemical space. The resulting potential, ANI-1ccx, approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. ANI-1ccx is broadly applicable to materials science, biology and chemistry, and billions of times faster than the parent CCSD(T)/CBS calculations.