Analysis of Parts of Speech Tagging on Text Clustering
Y. Sri Lalitha1, J Sirisha Devi21, L. Sukanya3, N.V. Ganapathi Raju4
1Dr. Y. Sri Lalitha, Department of IT, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India.
2Dr. J Sirisha Devi, Department of Computer Science and Engineering, Institute of Aeronautical Engineering, India.
3L. Sukanya, Department of IT, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India.
4Dr. N. Ganapathi Raju, Department of IT, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India.
Manuscript received on 09 June 2019 | Revised Manuscript received on 13 June 2019 | Manuscript published on 30 June 2019 | PP: 2287-2291 | Volume-8 Issue-8, June 2019 | Retrieval Number: H7128068819/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Clustering is a machine intelligence which aimed at grouping a set of objects into Subsets or clusters. Clustering text documents into various classifications is a vital advance in indexing, recovery, administration and removal of abundant text data on the Web. In research and development to prove that a new clustering algorithm is efficient, one needs to compare the existing algorithm with the new technique, for which the standard datasets are required. In this paper we have pre-processed the datasets to a standardized format, with an expansion of houses appropriate for a wide range of clustering and related experiments. Our objective is to set up a benchmark document datasets and extract the parts of speech such as verbs, nouns, adverbs, adjectives and etc from the documents of a given dataset and analyze the impact of parts of speech in clustering process.
Index Terms: Text Preprocessing, POS Tagging, Vocabulary, Clustering
Scope of the Article: Clustering