Measuring Semantic Similarity between Words using Page-Count and Pattern Clustering Methods
Prathvi Kumari1, Ravishankar K2
1Prathvi Kumari, M.Tech, Department of Computer Science Engineering, SIT, Mangalore (Karnataka), India.
2Prof. Ravishankar K, Associate Professor, Department of Computer Science Engineering, SIT, Mangalore (Karnataka), India.
Manuscript received on 10 July 2013 | Revised Manuscript received on 18 July 2013 | Manuscript Published on 30 July 2013 | PP: 31-34 | Volume-3 Issue-2, July 2013 | Retrieval Number: B0992073213/13©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Web mining involves activities such as document clustering, community mining etc. to be performed on web. Such tasks need measuring semantic similarity between words. This helps in performing web mining activities easily in many applications. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words remains a challenging task. In this paper to find the semantic similarity between two words it makes use of information available on the web and uses methods that make use of page counts and snippets to measure semantic similarity between two words. Various word co-occurrence measures are defined using page counts and then integrate those with lexical patterns extracted from text snippets. To identify the numerous semantic relations that exist between two given words, a pattern extraction and clustering methods are used. The optimal combination of page counts-based co-occurrence measures and lexical pattern clusters is learned using support vector machine used to find semantic similarity between two words. Finally semantic similarity measure what is got is in the range [0, 1], is used to determine semantic similarity between two given words. If two given words are highly similar it is expected to be closer to 1, if two given words are not semantically similar then it is expected to be closer to 0.
Keywords: Natural Language Processing, Semantic Similarity, Support Vector Machine, Text Snippets, Web Mining.
Scope of the Article: Clustering