COMPACT: Classifying Stream Data Optimally Using a Modified Pruning and Controlled Tie-Threshold
Gayathiri Kathiresan1, Krishna Mohanta2, Khanaa VelumailuAsari3
1Gayathiri Kathiresan, Research Scholar, Bharath Institute of Higher Education and Research Institute of Science and Technology, Chennai (Tamil Nadu), India.
2Krishna Mohanta, Associate Professor, Kakatiya Institute of Technology and Science for Woman, Nizamabad (Telangana), India.
3Khanaa VelumailuAsari, Dean Info, Bharath Institute of Higher Education and Research Institute of Science and Technology, Chennai (Tamil Nadu), India.
Manuscript received on 05 February 2019 | Revised Manuscript received on 13 February 2019 | Manuscript published on 28 February 2019 | PP: 512-519 | Volume-8 Issue-4, February 2019 | Retrieval Number: D2787028419/19©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Big data mining become important in extracting the potential information from the continuously arriving stream data. By extracting knowledge, the data mining algorithms significantly compute feasible decisions for various applications. The Very Fast Decision Tree (VFDT) classifier is a widely applied incremental decision tree to make better decisions. The VFDT classifier processes the arrival of the new instances, without storing them and updates the existing tree structure. Most of the conventional incremental decision tree based algorithms exploit the hoeffding’s bound based on the user-defined tie-threshold to split the tree and to manage the tree growth. Even though the size of the tree tremendously increases when handling the fluctuated and imbalanced stream data, it suffers from the misclassification issue due to lack of capturing the optimal attributes over the incoming stream data and declines the classification accuracy and performance. In order to resolve these issues, this paper extends the VFDT, named as Classifying stream data optimally using a Modified Pruning technique And Controlled Tie-threshold (COMPACT). The COMPACT method includes two components, such as enhanced information gain measurement and tie-breaking threshold based pruning method. In order to improve the VFDT performance without affecting the imbalanced data stream handling, the enhanced information gain measurement effectively identifies an optimal number of attributes for a data stream. In order to avoid the information gain biasing, it utilizes the advantages of enhanced splitting metric in attribute reduction. Instead of randomly selecting the threshold, the tie-breaking threshold based pruning method determines the tie-breaking threshold using a number of breaking points. The tie-breaking threshold based pruning method ensures the optimal tree structure while handling the large-scale stream dataset. Finally, the COMPACT method is evaluated using the weather dataset to demonstrate the efficiency. The proposed method significantly outperforms the existing DTFA approach in terms of recall, Root Mean Square Error (RMSE) rate, and execution time
Keyword: Big Data, Stream Data, VFDT Classifier, Bias, Information Gain, Threshold, Pruning, Imbalanced Data, Optimal Attributes, And Decision Making.
Scope of the Article: Streaming Data