Machine Learning Based Method for Prediction of Heart Disease in Big Data Environment
Sharmila Rengasamy1, Chellammal Surianarayanan2, Pethuru Raj Chellaih3

1Sharmila Rengasamy, Department of Computer Science, Bharathidasan University Constituent Arts & Science College, Navalurkuttapattu, Tiruchirappalli, Tamil Nadu, India.
2Chellammal Surianarayanan, Department of Computer Science, Bharathidasan University Constituent Arts & Science College, Navalurkuttapattu, Tiruchirappalli, Tamil Nadu, India.
3Pethuru Raj Chellaih, Site Reliability Engineering (SRE) Division, Reliance Jio Infocomm. Ltd. (RJIL), AVANA Building, Iblur Village, Sarjapur Road, Bangalore.
Manuscript received on March 15, 2020. | Revised Manuscript received on March 25, 2020. | Manuscript published on April 10, 2020. | PP: 1917-1921 | Volume-9 Issue-6, April 2020. | Retrieval Number: F3957049620/2020©BEIESP | DOI: 10.35940/ijitee.F3957.049620
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (

Abstract: Prediction of diseases is one of the challenging tasks in healthcare domain. Conventionally the heart diseases were diagnosed by experienced medical professional and cardiologist with the help of medical and clinical tests. With conventional method even experienced medical professional struggled to predict the disease with sufficient accuracy. In addition, manually analysing and extracting useful knowledge from the archived disease data becomes time consuming as well as infeasible. The advent of machine learning techniques enables the prediction of various diseases in healthcare domain. Machine learning algorithms are trained to learn from the existing historical data and prediction models are being created to predict the unknown raw data. For the past two decades, machine learning techniques are extensively employed for disease prediction. Despite the capability of machine algorithm on learning from huge historical data which is stored in data mart and data warehouses using traditional database technologies such as Oracle OnLine Analytical Processing (OLAP). The conventional database technologies suffer from the limitation that they cannot handle huge data or unstructured data or data that comes with speed. In this context, big data tools and technologies plays a major role in storing and facilitating the processing of huge data. In this paper, an approach is proposed for prediction of heart diseases using Support Vector Algorithm in Spark environment. Support Vector Machine algorithm is basically a binary classifier which classifies both linear and non-linear input data. It transforms the non-linear data into hyper plan with the help of different kernel functions. Spark is a distributed big data processing platform which has a unique feature of keeping and processing a huge data in memory. The proposed approach is tested with a benchmark dataset from UCI repository and results are discussed. 
Keywords: Support Vector Machine for Heart disease Machine, Spark MLLib for Heart Disease Prediction, Big Data for Disease Prediction, Machine Learning Algorithms for Disease Predication.
Scope of the Article: Machine Learning