Abstract
Chemical reactions can be connected in large networks such as knowledge graphs. In this way, prior
work has been able to draw meaningful conclusions about the structures and properties of the included
organic chemistry. However, the research has focused on public sources of organic chemistry that
might lack the intricate details of the synthesis routes used in in-house drug discovery. In this work,
we expand on previous analyses to also include an in-house electronic lab notebook (ELN), such that
important differences between the network architectures can be investigated. Three chemical reaction
knowledge graphs were constructed from US Patent and Trademark Office (USPTO), Reaxys, and an
in-house ELN, respectively. The three knowledge graphs were compared. We found that the Reaxys
knowledge graph is the most interconnected, whereas the USPTO and ELN knowledge graphs appear
more arranged around a few central nodes. These differences might be attributed to the different
origins of the data in the three sources.