Classification Models for Handling Missing Data
Jong Chan Lee

Jong Chan Lee, Department of Computer Engineering, Chungwoon Univ., Incheon, Korea.

Manuscript received on 01 January 2019 | Revised Manuscript received on 06 January 2019 | Manuscript Published on 07 April 2019 | PP: 311-315 | Volume-8 Issue- 3C January 2019 | Retrieval Number: C10690183C19/2019©BEIESP

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Incomplete data that can be easily accessed in the ubiquitous environment has a great impact on the performance of the classification system depending on the degree of information loss, and thus it is essential to overcome the problem. Methods/Statistical analysis: Therefore, this paper proposes a data model that can compensate lost data by probability technique and assign weight to each event. Two existing classification models(FLDF, EBP) are extended to perform learning in accordance with the structure of this data model. It is confirmed that performance evaluation of both models can play a role as incomplete data processing system while varying degrees of loss information. Findings: The extended data format has been applied to various loss data by applying a probability concept to each attribute value and assigning a weight indicating importance to each event in general data. The main view point in this paper is to modify the learning structure so that this data structure can be applied to two different algorithms, and to verify whether the damage can be preserved according to the original purpose by inputting the damaged data. Two classification algorithms have been selected for this purpose. FLDF is a gradual expansion model using Fisher ‘s equation widely used in statistics, and EBP is a basic idea of deep learning that repeatedly weights are learned in a given model. The experimental procedure shows that the loss data can be handled properly in both models. Especially, in EBP, one attribute value is distributed to several input nodes, and it is confirmed that they are excellent in recovering the lost part even though they learn in the next layer. Improvements/Applications: Experiments are carried out to confirm that the two models are applied for a given purpose. In the experimental data, a certain percentage of events and attributes are arbitrarily selected and damaged, and then used as experimental data. For fairness, 10 runs are performed per experiment and the average value of these values is calculated as the result of the experiment. This paper suggests that it is useful to use the proposed restoration method in the field where the damaged data should be used for learning due to various factors.

Keywords: Extended Data Expression, FLDF, EBP, Missing Data, Deep Learning.
Scope of the Article: Classification