Performance Analysis of Big data Classification Techniques on Diabetes Prediction
P. Pandeeswary1, M. Janaki2
1P. Pandeeswary, M.Phil Scholar, Department of Computer Science, Dr. Umayal Ramanathan College For Women , Karaikudi.
2Dr. M. Janaki, Associate Proffessor, Department of Computer Science, Dr. Umayal Ramanathan College for Women, Karaikudi.
Manuscript received on 04 July 2019 | Revised Manuscript received on 09 July 2019 | Manuscript published on 30 August 2019 | PP: 533-537 | Volume-8 Issue-10, August 2019 | Retrieval Number: J88400881019/2019©BEIESP | DOI: 10.35940/ijitee.J8840.0881019
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Big data is extremely huge data sets analyzed computationally to expose patterns, trends, and prediction in order to make simpler the decision making. Predicting diseases became very important, it can be obtained with a large dataset using classification techniques. Various big data analytics tools are available for classification. Classification is the general technique used in the medical analysis for data prediction. In these paper classification algorithms like Support vector machine, Naïve Bayesian and C4.5 are discussed. The Pima Indian Diabetes Database (PIDD) is used in the analysis of the Classification algorithms to sort out and classify the people with diabetes positive and with diabetes negative it is openly accessible machine learning database found at UCI. The objective is to find the best suitable technique for prediction. Here, we used the comparison method with the results of three supervised learning algorithms based on three criteria, computation time taken, accuracy rate and error rate using the Tanagra tool. The classification algorithms are used to predict diabetes based on the data given. Accordingly, many classification techniques are there, from this study a few classification techniques suggested to be used in big data analysis, which has the probability to significantly progress the prediction. . As a result, a representative confusion matrix is displayed for making the verification process faster. From the results, it is concluded that C4.5 algorithm is best suited for predicting diabetes disease and also can be used in other disciplinary for making better prediction.
Keywords: Big data, Classification techniques, C4.5, Diabetes, Naïve Bayesian, SVM, Tanagra.
Scope of the Article: Classification