Imputation Methods for Missing Data for a Proposed VASA Dataset
1Anitha.S*, Research scholar, Dept of Computer Applications, AlagappaUniversity, Karaikudi, India.
2Dr.Vanitha.M, Asst. professor, Dept of Computer Applications, Alagappa University, Karaikudi, India.
Manuscript received on October 17, 2019. | Revised Manuscript received on 29 October, 2019. | Manuscript published on November 10, 2019. | PP: 1950-1953 | Volume-9 Issue-1, November 2019. | Retrieval Number: A5204119119/2019©BEIESP | DOI: 10.35940/ijitee.A5204119119
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Preprocessing is the presentation of raw data before apply the actual statistical method. Data preprocessing is one of the most vital steps in data mining process and it deals with the preparation and transformation of the initial dataset. It is prominent because the investigating data which is not properly preprocessed could lead to the result which is not accurate and meaningless. Almost every research have missing data and introduce an element into data analysis using some method. To consider the missing values that need to provide an efficient and valid analysis. Missing imputation is one of the process in data cleaning. Here, four different types of imputation methods are compared: Mean, Singular Value Decomposition (SVD), K-Nearest Neighbors (KNN), Bayesian Principal Component Analysis (BPCA). Comparison was performed in the real VASA dataset and based on performance evaluation criteria such as Mean Square Error (MSE) and Root Mean Square Error (RMSE). BPCA is the best imputation method of interest which deserve further consideration in practice.
Keywords: Data Preprocessing, Missing Data, Imputation Methods, BPCA, RMSE.
Scope of the Article: Data Mining