Abstract
Pesticides benefit agriculture by increasing crop yield, quality, and security. However, pesticides may inadvertently harm bees, which are agriculturally and ecologically vital as pollinators. The development of new pesticides---driven by pest resistance to and demands to reduce negative environmental impacts of incumbent pesticides---necessitates assessments of pesticide toxicity to bees.
We leverage a data set of 382 molecules labeled from honey bee toxicity experiments to train a classifier that predicts the toxicity of a new pesticide molecule to honey bees. Traditionally, the first step of a molecular machine learning task is to explicitly convert molecules into feature vector representations for input to the classifier. Instead, we (i) adopt the fixed-length random walk graph kernel to express the similarity between any two molecular graphs and (ii) use the kernel trick to train a support vector machine (SVM) to classify the bee toxicity of pesticides represented as molecular graphs. We assess the performance of the graph-kernel-SVM classifier under different walk lengths used to describe the molecular graphs. The optimal classifier, with walk length 4, achieves a (mean over 100 runs) accuracy, precision, recall, and F1 score of 0.82, 0.69, 0.74, and 0.71 on the test data set.