A Hybrid Technique for Health Insurance Fraud Detection on Highly Imbalanced Dataset
Shamitha S K1, V Ilango2

1Shamitha S K, Research Scholar, Department of Master of Computer Applications, CMR Institute of Technology, AECS, Layout, ITPL Main Road, Bangalore-560037
2V Ilango, Head – Centre of Excellence for Intelligent Human-Computer Interaction (c-IHCI), CMR Institute of Technology, AECS, Layout, ITPL Main Road, Bangalore-560037
Manuscript received on 22 August 2019. | Revised Manuscript received on 04 September 2019. | Manuscript published on 30 September 2019. | PP: 3498-3501 | Volume-8 Issue-11, September 2019. | Retrieval Number: K24890981119/2019©BEIESP | DOI: 10.35940/ijitee.K2489.0981119
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Health Insurance industry is producing a massive amount of heterogeneous data. Detecting fraud from these data is a challenging task. Highly imbalanced data causes huge challenge to the Insurance Data Analysis. Classification of imbalanced data is a critical issue faced by the fraud detection methodologies. Fraud only covers less than 10% of the whole data. In this study, we use highly imbalanced data and propose a hybrid method for fixing class imbalance problem by using a combination of SMOTE, Cross Validation, and Random Forest. We used Medicare data, which will be applied to various sampling techniques, and further a classification model was built. We observed that SMOTE with Random forest with cross validation produced excellent results. Our model should be capable of identifying all the relevant(fraud) instances, i.e., the model should have a high recall value. SMOTE with Random forest had average recall of 86% and an overall accuracy of 90%, which could be considered as good among the existing models.
Keywords: Health Insurance Fraud, SMOTE, Cross Validation, Random Forest.
Scope of the Article: Healthcare Informatics.