Examination of Big Dataset using LEOS, JOSE, SVM on MapReduce
Bhagyashree Patle1, Vijayarajan V2

1Bhagyashree Patle1, School of Computer Science and Engineering, VIT, Vellore (Tamil Nadu), India. 

2Vijayarajan V, School of Computer Science and Engineering, VIT, Vellore (Tamil Nadu), India. 

Manuscript received on 28 November 2019 | Revised Manuscript received on 07 December 2019 | Manuscript Published on 14 December 2019 | PP: 449-455 | Volume-9 Issue-1S November 2019 | Retrieval Number: A10941191S19/2019©BEIESP | DOI: 10.35940/ijitee.A1094.1191S19

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Data analytics (DA) is the job of reviewing datasets in order to frame conclusions about the information they have, increasingly using specialized systems and software. As with the emergence of Big Data, data analytics was needed. The problems that we are considering are going to be in a fraud detection application. Where we’ll considering major aspects such application-independent format(XML/JSON) for the clusterization process based on the no label classification algorithm where we will focusing on the clusters to enhance the oversampling process and utilize the merits of parallel computing to speed up our system. We aim to use MapReduce functionality in our application and deploy it on Amazon AWS. Datasets gathered for studies often comprise millions of records and can carry hard-to-detect concealed pitfalls. In this paper, we are working on two datasets. The first one is a medical dataset and the second one is a customer dataset. Big Data Analytics is the suggested solution in this day and age, with growing demands for analyzing huge information sets and performing the required processing on complicated data structures. The problem faced at the moment is mainly, how to store and analyze the large amount of data which is generated from heterogeneous sources like social media and what to use to make data fast accessible as well as in pocket budget. To resolve all problems Map-Reduce framework is useful-by offering an integrated technique towards machine learning, it speeds up processing. In this, we will explore the LEOS algorithm, SVM, MapReduce and JOSE algorithm, their requirements, their benefits, their disadvantages, difficulties, and their corresponding solutions.

Keywords: Big Data Analytics, MapReduce, LEOS, Dataset, AWS, RHEEM studio, Cluster, XML, JSON.
Scope of the Article: Big Data Quality Validation