Data Leak Identification in Social Networks using K Means Clustering & Tabu K Means Clustering
Jayavarapu Karthik1, V. Tamizhazhagan2, S.Narayana3

1Jayavarapu Karthik*, Research Scholar, Computer Science and Engineering, Annamalai University, Chidambaram, India.
2Dr.V. Tamizhazhagan, Assistant Professor, Information Technology Annamalai University, Chidambaram, India.
3Dr.S.Narayana, Professor, Computer Science and Engineering, Gudlavalleru Engineering College, Gudlavalleru, India.

Manuscript received on November 15, 2019. | Revised Manuscript received on 20 November, 2019. | Manuscript published on December 10, 2019. | PP: 2777-2783 | Volume-9 Issue-2, December 2019. | Retrieval Number: B6635129219/2019©BEIESP | DOI: 10.35940/ijitee.B6635.129219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The prevention of leakage of data has been defined as a process or solution which identifies data that is confidential, tracks the data in a way in which it moves in and out of its enterprise to prevent any unauthorized data disclosure in an intentional or an unintentional manner. As data that is confidential is able to reside on various computing devices and move through several network access points or different types of social networks such as emails. Leakage of emails has been defined as if the email either deliberately or accidentally goes to an addressee to whom it should not be addressed. Data Leak Prevention (DLP) is the technique or product that tries mitigating threats to data leaks. In this work, the technique of clustering will be combined with the frequency of the term or the inverse document frequency in order to identify the right centroids for analysing the various emails that are communicated among members of an organization. Every member will fit in to various topic clusters and one such topic cluster can also comprise of several members in the organization who have not communicated with each other earlier. At the time when a new email is composed, every addressee will be categorized to be a potential leak recipient or one that is legal. Such classification was based on the emails sent among the sender and the receiver and also on their topic clusters. The work had investigated the technique of K-Means clustering and also proposed a Tabu – K-Means (TABU-KM) technique of clustering to identify points of optimal clustering. The proposed TABU-KM optimizes the K-Means clustering. Experimental results demonstrated that the proposed method achieves higher True Positive Rate (TPR) for known and unknown recipient and lower False Positive Rate (FPR) for known and unknown recipient. Keywords: Data leakage prevention, email leakage, 
Keywords: Clustering Technique and Tabu K-Means (Tabu-KM) Clustering technique.
Scope of the Article: Clustering