A New Weighted Associative Classifier Based on Nature Language Processing and Its Application in Chemical Data Mining

10 June 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Associative classification mining (ACM) integrating association rule mining and classification has become a significant tool for knowledge discovery, especially in the chemical domain. Its major advantage is providing high accuracy as well as chemically interpretable models. Additionally, it is able to find associations among features while other traditional methods such as decision tree and naïve Bayesian consider the features independent to each other. In this paper, we propose a new weight framework for ACM based on information gain and graph theory. Combing this new scheme with CBA (classification based on associations), a novel classifier—IGAC (information gain and graph based associative classifier) is implemented and applied to three chemical datasets. In the generated models, the importance of the features related to the observed label classes is considered. The results show that not only IGAC can produce high accuracy (above 90%) but also the resulted models can be relatively easily interpreted by chemical knowledge. In addition, IGAC can discover meaningful rules which cannot be identified by classical associative classification mining (ACM).

Keywords

bipartite graph
text categorization
weighted associative classification mining

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.