Validating the Effect of Different Discretization Methods for Redic K-Prototype Clustering Algorithm
Khyati R Nirmal1, K.V.V. Satyanarayana2
1Khyati R. Nirmal, Research Scholar, Department of Computer Science Engineering, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram (Andhra Pradesh), India.
2K.V.V. Satyanarayana, Department of Computer Science Engineering, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram (Andhra Pradesh), India.
Manuscript received on 02 June 2019 | Revised Manuscript received on 05 June 2019 | Manuscript published on 30 June 2019 | PP: 2231-2236 | Volume-8 Issue-8, June 2019 | Retrieval Number: H7143068819/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The REDIC K-prototype clustering algorithm is designed for mixed datasets which selects the initial centroids significantly and it also removes the dependency on prior value for number of cluster (k) and influence parameter (λ). Data preprocessing on data set introduce empirical better performance for any data mining algorithm. In this paper the taxonomy is build by integrating the data preprocessing technique – discretization with REDIC K-Prototype clustering algorithm. This taxonomy validates the performance of the algorithm for four different dataset and three performance indices. The numerical attributes of dataset need to be discretized and converted to categorical attribute before the clustering. Here the four discretization techniques are considered Equal Width Binning, Equal Frequency Binning, Entropy Based Binning, and the special case of Equal Width Binning that is binary Binning Approach. The result of proposed algorithm are compared with the standard K-Mode and K-Prototype clustering for original dataset and discretized data set. From the performance analysis it is clear that for 70% cases the REDIC K-Prototype Clustering with different discretization method gives better performance in compare to standard algorithms.
Keywords: REDIC K-Prototype Clustering Algorithm, Discretization; Equal Frequency Binning; Equal width Binning; Entropy based Binning
Scope of the Article: Clustering