Balancing Privacy Vs Efficiency in Data Analytics using Nearest Neighbour Randomization
Geetha Peethambaran1, Chandrakant Naikodi2, Suresh Lakshmi Narasimha Setty3

1Geetha Peethambaran*, Department of CSE, Cambridge Institute of Technology, Bengaluru, India.
2Chandrakant Naikodi, Department of CSE, Cambridge Institute of Technology, Bengaluru, India.
3Suresh L, Principal and Professor, Cambridge Institute of Technology, Bengaluru, India.

Manuscript received on September 16, 2019. | Revised Manuscript received on 24 September, 2019. | Manuscript published on October 10, 2019. | PP: 2289-2295 | Volume-8 Issue-12, October 2019. | Retrieval Number: L25671081219/2019©BEIESP | DOI: 10.35940/ijitee.L2567.1081219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The Digital era marked by the unrivalled growth of Internet and its services with day-to-day technological advancements has paved way for a data driven society. This digital explosion offers opportunities for extracting valuable information from collected data, which are used by organizations and research establishments for synergistic advantage. However, privacy of online divulged data is an issue that gets overlooked as a consequence of such large-scale analytics. Although, privacy and security practices conjointly determine the ethics of data collection and its use, personal data of individuals is largely at risk of disclosure. Considerable research has gone into privacy preserving analytics, in the light of Big Data and IoT boom, but scalable and efficient techniques, that do not compromise the usefulness of privacy constrained data, continues to be a challenging arena for research. The proposed work makes use of a distance-based perturbation method to group data and further randomizes data. The efficacy of perturbed data is evaluated for classification task that gives results on par with the non-perturbed counterpart. The relative performance of the algorithm is also evaluated on the parallel computing platform Spark. Results show that the technique does not hinder the use of data for holistic analysis while privacy is subjectively maintained.
Keywords: Privacy Preserving, Analytics, Big Data, Perturbation, Performance, utility
Scope of the Article: Big Data Analytics