Variants of Term Frequency and Inverse Document Frequency of Vector Space Model for Effective Document Ranking in Information Retrieval
Deepa Yogish1, Manjunath T N2, Ravindra S Hegadi3
1Deepa Yogish, Research Scholar, VTU-RC- Department of Information Science and Engineering, BMS Institute of Technology and Management, Bangalore (Karnataka), India.
2Manjunath T N, Professor and Head, Department, ISE, BMSIT&M, Bangalore Ravindra S Hegadi Professor, School of Computational Sciences, Solapur University, Solapur (Maharashtra), India.
Manuscript received on 01 May 2019 | Revised Manuscript received on 15 May 2019 | Manuscript published on 30 May 2019 | PP: 414-421 | Volume-8 Issue-7, May 2019 | Retrieval Number: G5249058719/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Advances in the world of internet has made information grow exponentially which make people tend to use information retrieval system more often like Google, Ask, Yahoo etc. to extract relevant and contextual information for their query. The task of information retrieval system is to retrieve relevant document from a huge volume of data sets underlying in the internet using appropriate model. Vector space model is an unconventional model in information retrieval for document ranking. VSM adopts similarity measure for matching between documents and user query, and assign scores from the biggest to smallest .The variants of vector space model are used for information retrieval to rank the documents based on similarity values. The proposed model pre-processes the documents and queries using natural language processing techniques like tokenization, stop word removal and stemming to increase the accuracy of the retrieval process and to reduce the search space. The documents and query are assigned with weights using term frequency and inverse document frequency method. To find relevant document to the query term the document ranking function cosine similarity score is applied for every document vector and the query term vector. The documents having high similarity scores will be considered as relevant documents to the query term and they are ranked based on these scores. This paper emphasizes on different approaches of vector space model using variants of term frequency and inverse document frequency to compute similarity values to rank set of documents for a given query. This paper provides comparison analysis of different variants of vector space model for document ranking.
Keyword: Information Retrieval (IR), Inverse Document Frequent(idf), Natural Language Processing (NLP), Term Frequency(tf), Vector Space Model (VSM).
Scope of the Article: Information Retrieval.