Improving Accuracy For Cancer Classification with Gene Selection
Geeta Chhabra1, Vasudha Vashisht2, Jayanthi Ranjan3
1Geeta Chhabra, Research Scholar, Amity Institute of Information Technology, Amity University, Noida (Uttar Pradesh), India.
2Vasudha Vashisht, Assistant Professor, Department of Computer Science & Engineering, Amity School of Engineering& Technology, Amity University, Noida (Uttar Pradesh), India.
3Jayanthi Ranjan, Professor, Institute of Management Technology, Ghaziabad (Uttar Pradesh), India.
Manuscript received on 05 February 2019 | Revised Manuscript received on 13 February 2019 | Manuscript published on 28 February 2019 | PP: 192-199 | Volume-8 Issue-4, February 2019 | Retrieval Number: D2686028419/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The article presents a detail overview of different classification techniques for colon cancer prediction by gene expression dataand evaluated their performance based on classification accuracy, computational time &proficiency to reveal gene information. The gene selection methods have been introduced also and evaluated with respect to their statistical significance to cancer classifier.The purpose is to build a multivariate model for tumour classification with genetic algorithm.The multivariate models were constructed using nearest centroid, k-nearest neighbours, support vector machine, maximum likelihood discriminant functions, neural networks and random forest classifiers combined with genetic algorithm applied to the colon cancer publicly available dataset.It has been observed from the experimental analysis that Maximum Likelihood Discriminant Functions (MLHD) performs better and accuracy has been further been improved by using most frequent genes using the forward selection method. Also, maximum likelihood discriminant functions are cost effective and faster than neural networks (NNET), nearest centroid (Nearcent) and random forest (RF). Thus, the experiments show that classification accuracy is affected with the selection of genes that contributes to the accuracy of the model. It will remove the irrelevant genes thus will reduce the size and make the algorithm fast.
Keyword: Data Mining, Genetic Algorithm, Machine Learning Algorithms.
Scope of the Article: Machine Learning