Keyword Extraction from Tweets using Graph-Based Methods
G. Dhana Lakshmi1, M. Kanthi Rekha2
1G. Dhana Lakshmi, Department of Computer Science and Engineering, University College of Engineering (A), JNTUK, Kakinada (Andhra Pradesh), India.
2M. Kanthi Rekha, Department of Computer Science and Engineering, University College of Engineering (A), JNTUK, Kakinada (Andhra Pradesh), India.
Manuscript received on 24 February 2020 | Revised Manuscript received on 04 March 2020 | Manuscript Published on 15 March 2020 | PP: 16-22 | Volume-9 Issue-4S2 March 2020 | Retrieval Number: D10050394S220/2020©BEIESP | DOI: 10.35940/ijitee.D1005.0394S220
Open Access | Editorial and Publishing Policies | Cite | Zenodo | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Social media refers to a set of different web sites like Twitter is a microblogging service that generates a huge amount of textual content daily. These methods based on text mining, natural language processing, and information retrieval are usually applied. The text mining approaches, documents are represented using the well-known vector space model, which results in sparse matrices to be dealt with computationally. A technique to extract keywords from collections of Twitter messages based on the representation of texts employing a graph structure, from which it is assigned relevance values to the vertices, based on graph centrality measures. The proposed approach, called TKG, relies on three phases: text pre-processing; graph building and keyword extraction. The first experiment applies TKG to a text from the Time magazine and compares its performance with TFID  and KEA, having human classifications as benchmarks. Finally, these algorithms are designed to the sets of tweets of increasing size were used and the computational time necessary to run the algorithms was recorded and compared. The results obtained in these experiments showed that building the graph using an all neighbors edging scheme invariably provided superior performance, and assigning weights to the edges based on the weight as the inverse co-occurrence frequency was superior cases. One possible future work is to apply centrality measures TKG showed to be faster for all its variations when compared with TFIDF and KEA, except for the weighting scheme based on the inverse co-occurrence frequency. TKG is a novel and robust proposal to extract keywords from texts, particularly from short messages, such as tweets.
Keywords: Twitter, Text Mining, Graph-based Text Representation, Centrality, Keyword Extraction.
Scope of the Article: Probabilistic Models and Methods