Exploration of Neighbor Kernels and Feature Estimators for Heart Disease Prediction using Machine Learning
Rincy Merlin Mathew1, M. Shyamala Devi2, Shakila Basheer3

1Rincy Merlin Mathew, Lecturer, Department of Computer Science, College of Science and Arts, Khamis Mushayt, King Khalid University, Abha, Asir, Saudi Arabia.
2M. Shyamala Devi, Associate Professor, Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai, Tamil Nadu, India.
3Shakila Basheer, Assistant Professor, Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia

Manuscript received on September 11, 2019. | Revised Manuscript received on 20 September, 2019. | Manuscript published on October 10, 2019. | PP: 599-605 | Volume-8 Issue-12, October 2019. | Retrieval Number: L34721081219/2019©BEIESP | DOI: 10.35940/ijitee.L3472.1081219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: In the growing era of technological world, the people are suffered with various diseases. The common disease faced by the population irrespective of the age is the heart disease. Though the world is blooming in technological aspects, the prediction and the identification of the heart disease still remains a challenging issue. Due to the deficiency of the availability of patient symptoms, the prediction of heart disease is a disputed charge. With this overview, we have used Heart Disease Prediction dataset extorted from UCI Machine Learning Repository for the analysis and comparison of various parameters in the classification algorithms. The parameter analysis of various classification algorithms of heart disease classes are done in five ways. Firstly, the analysis of dataset is done by exploiting the correlation matrix, feature importance analysis, Target distribution of the dataset and Disease probability based on the density distribution of age and sex. Secondly, the dataset is fitted to K-Nearest Neighbor classifier to analyze the performance for the various combinations of neighbors with and without PCA. Thirdly, the dataset is fitted to Support Vector classifier to analyze the performance for the various combinations of kernels with and without PCA. Fourth, the dataset is fitted to Decision Tree classifier to analyze the performance for the various combinations of features with and without PCA. Fifth, the dataset is fitted to Random Forest classifier to analyze the performance for the various levels of estimators with and without PCA. The implementation is done using python language under Spyder platform with Anaconda Navigator. Experimental results shows that for KNN classifier, the performance for 12 neighbours is found to be effective with 0.52 before applying PCA and 0.53 after applying PCA. For Support Vector classifier, the rbf kernel is found to be effective with the score of 0.519 with and without PCA. For Decision Tree classifier, before applying PCA, the score is 0.47 for 7 features and after applying PCA, the score is 0.49 for 4 features. For, Random Forest Classifier, before applying PCA, the score is 0.53 for 500 estimators and after applying PCA, the score is 0.52 for 500 estimators.
Keywords: Machine Learning, Classification, Kernel, Feature, Neighbor, Estimator.
Scope of the Article: Classification