Abstract
Predicting the complex flavor profiles of specialty Arabica coffee is a challenging task due to the subjective nature of human sensory evaluations. This study investigates the application of visible-near-infrared (vis-NIR) spectroscopy coupled with multi-label classification techniques to simultaneously predict the presence of flavors described by the Specialty Coffee Association's Flavour Wheel in unroasted green coffee beans.
Sixty lots of green coffee beans from various origins were analyzed by vis-NIR spectroscopy, yielding spectral data from 400-1100 nm. Flavour notes for each lot were provided by a commercial coffee roaster based on sensory evaluations, which were binarized as present or absent labels. Nine flavour notes from the Flavour Wheel were modeled: Floral, Honey, Caramel, Fruits, Citrus, Berry, Cocoa, Nuts, and Spice.
Exploratory data analysis using principal component analysis and uniform manifold approximation and projection revealed no clear clustering based on flavor notes or origin. However, potential correlations between related flavors on the Flavour Wheel were observed.
Several multi-label classification approaches were explored, including binary relevance, classifier chains with various chaining strategies, and decomposed binary classifiers. Model performance was evaluated using Hamming loss and mean balanced accuracy across all labels.
The best results were achieved using a decomposed approach by extracting the best-performing binary models for each flavour note from the binary relevance experiments. This yielded a Hamming loss of 0.2778 and a mean balanced accuracy of 69%.
Classifier chain methods consistently underperformed, suggesting potential error propagation or that flavour note relationships may not directly translate to taste perception. Independently training binary classifiers achieved low Hamming loss but suffered from overfitting.
This study demonstrates the feasibility of using visible-NIR spectroscopy and the potential of multi-label classification to predict the flavor profiles of green coffee beans. With larger datasets and deeper investigations into flavor correlations, these techniques could enable efficient prediction of green coffee beans in the supply chain.