Syntactic and Sementic Based Similarity Measurenent for Plagiarism Detection
Sumathi S1, Geetha M P2, P Ganesh Kumar3, K Pushpalatha4, A S Shanthi5

1Sumathi S*, Assistant Professor (Senior Grade), Department of Computer Science and Engineering, Sri Ramakrishna Institute of Technology, Coimbatore, Tamil Nadu, India.
2Geetha M.P, Assistant Professor, Department of Computer Science and Engineering, Sri Ramakrishna Institute of Technology, Coimbatore, Tamil Nadu, India.
3Dr P Ganesh Kumar, Assistant Professor, Department of Information Technology, Anna University Regional Center, Coimbatore, Tamil Nadu, India.
4Dr K Pushpalatha, Assistant Professor, Department of Information Technology, Coimbatore Institute of Engineering and Technology, Coimbatore.
5Dr A S Shanthi , Associate Professor, Department of Computer Science and Engineering, Tamil Nadu College of Engineering, Coimbatore.

Manuscript received on November 15, 2019. | Revised Manuscript received on 20 November, 2019. | Manuscript published on December 10, 2019. | PP: 155-159 | Volume-9 Issue-2, December 2019. | Retrieval Number: A5268119119/2019©BEIESP | DOI: 10.35940/ijitee.A5268.129219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: In the world of digital era, there is a high availability of huge amount of online documents which leads to plagiarism. Plagiarism is the act of copying other person work. The paper based documents are stored in the digital libraries for future references. In the olden days, people used the Latin word “plagiarius” to indicate the act of stealing someone else work. Plagiarism is the act of using one’s ideas, concepts, words or structures without citing their references where original work is expected from the users. In this paper, the main objective is to compare the contents of original document that matches with the contents in other documents. These matches are considered depending on the syntactic matches and also the semantic similarity. This paper employs Sentence Hashing Algorithm for Plagiarism Detection focusing on complete sentence sequences and calculates hash – sum for the sentence sequences. When the user compares the original document to several documents, if the similarity value of the document is 1, then the contents present in the original document is 100% same in the compared documents, i.e., fully plagiarized. If the similarity value varies from 0.1 to 0.9, then it is partially plagiarized. The similarity value is 0%, then the original document is unique. 
Keywords: Plagiarism Detection, Syntactic and Semantic based Similarity, Sentence Hashing, Text Mining
Scope of the Article: Measurement & Performance Analysis