Data Pre-Processing and Customized Onto-Graph Construction for Knowledge Extraction in Healthcare Domain of Semantic Web
Monika P1, G T Raju2

1Monika P, Research Scholar, R&D Centre, CSE Dept., RNS Institute of Technology, Assistant Professor, Dept. of CSE, Dayananda Sagar College of Engineering, Bengaluru, Visvesvaraya Technological University, Belagavi, Karnataka.
2G T Raju, Professor, Dept. of CSE, RNS Institute of Technology, Bengaluru, Visvesvaraya Technological University, Belagavi, Karnataka.

Manuscript received on 28 August 2019. | Revised Manuscript received on 09 September 2019. | Manuscript published on 30 September 2019. | PP: 712-719 | Volume-8 Issue-11, September 2019. | Retrieval Number: K14230981119/2019©BEIESP | DOI: 10.35940/ijitee.K1423.0981119
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (

Abstract: Present electronic world produces enormous amount of data every second in various formats, especially in healthcare units. To efficiently utilize the available data by representing it in the machine readable form, the concept of Semantic web stepped in progressing towards automated knowledge discovery process. In this paper, comprehensive pre-processing techniques have been proposed for preparing the raw data to be presentable in structured format so as to construct the onto-graph for selected features in a health care domain. Cluster based Missing Value Imputation Algorithm (CMVI) has been proposed to enhance the quality of the imputed data which is the most important step during data pre-processing. Missing values were randomly induced into the Pima Indian Diabetic dataset with the missing ratio of 1%, 3% and 5% for each attribute up to 50% of the attributes in the original diabetic dataset. The experimental observations reveal that the quality of the pre-processed data is better compared to raw, unprocessed data in terms of imputation accuracy measured against coefficient of determination (R2 ), Index of agreement (d2 ) and Root Mean Square Error (RMSE).Documented results proved that the proposed techniques are comparatively superior than the traditional approaches with increased R2 & d2 and decreased RMSE scores. Further, importance of knowledge graph and various ontological representation types are discussed in short as construction of .owl file is the first step towards automation in semantic web.
Keywords: Semantic web, Ontologies, Ontology agents, Onto-graph, Knowledge graph, Knowledge extraction, Data pre-processing, Handling missing values, Missing Value Imputation, Health care, Diabetology.
Scope of the Article: Semantic Web