Density Based Feature Selection Method for Medical Datasets
Manonmani.M1, Sarojini Balakrishnan2
1Manonmani M*, Research Scholar, Department of Computer Science, Avinashilingam Institure for Home Science and Higher Education for Women, Coimbatore, India.
2Dr. Sarojini Balakrishnan, Assistant Professor (SS), Department of Computer Science, Avinashilingam Institure for Home Science and Higher Education for Women, Coimbatore, India.
Manuscript received on September 16, 2019. | Revised Manuscript received on 24 September, 2019. | Manuscript published on October 10, 2019. | PP: 4370-4374 | Volume-8 Issue-12, October 2019. | Retrieval Number: L3875081219/2019©BEIESP | DOI: 10.35940/ijitee.L3875.1081219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: High dimensional data are found in the medical domain that needs to be processed for improved data analysis. In order to deal with the curse of dimensionality, feature selection process is employed in almost all data mining applications. In this research work, Density based Feature Selection (DFS) method that ranks the features by finding the Probability Density Function (PDF) of each feature is applied to medical datasets that suffer from the curse of dimensionality. The DFS method is a filter based approach that selects the most discriminatory features from the given feature set. The feature selection method evaluates the importance of the feature with regard to the target class using density function. The DFS method has major advantages over other methods, since it is based on the ranking method to select the most discriminatory features from the whole feature set. This research work finds the best feature subset that can be used in prediction and classification of medical datasets imbibed with high dimensionality. The DFS method based on PDF is applied on the three medical datasets namely Chronic Kidney Disease (CKD) dataset, Breast Cancer Wisconsin Dataset and Parkinsons Dataset. The proposed feature selection method evaluates the merit of each feature, assign weights to the feature and rank the features based on their feature density. The reduced feature subset is then validated by the application three classification algorithms namely Support Vector Machine (SVM), Gradient Boosting, and Convolutional Neural Network (CNN). The performance of the classification algorithms are evaluated based on the performance metrics Accuracy, Sensitivity and Specificity. Experimental results indicate that the performance of the classification algorithms SVM, Gradient Boosting, and CNN is improved after the feature selection process.
Keywords: Curse of Dimensionality, Filter Method, Density Based Feature Selection, Probability Density Function, SVM, Gradient Boosting,
Scope of the Article: Classification