Opinion Mining on Social Media Transit Tweets using Text Pre-Processing and Machine Learning Techniques
Meesala Shobha Rani1, Sumathy. S2

1Meesala Shobha Rani, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
2Sumathy.S*, School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India. 

Manuscript received on October 16, 2019. | Revised Manuscript received on 25 October, 2019. | Manuscript published on November 10, 2019. | PP: 1015-1025 | Volume-9 Issue-1, November 2019. | Retrieval Number: A4631119119/2019©BEIESP | DOI: 10.35940/ijitee.A4631.119119
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Capturing public insights related to transit systems in social media has gained huge popularity presently. The regional transportation agencies use social media as a tool to provide information to the public and seek their inputs and ideas for meaningful decision making in transportation activities. This exploratory study attempts to gauge the impact of social media use in transportation planning that in turn would help transportation administration in identifying the day-to-day challenges faced by the customers and to suggest a suitable solution. This paper presents the effect of pre-processing techniques on transit opinion analysis to improve the performance. Performance of different pre-processing methods namely stop word removal, stemming, lemmatization, negation handling and URL removal using feature representation models namely TF-IDF with unigram, TF-IDF with bigram on three feature selection techniques including information gain, standard deviation and chi-square on social media transit rider’s opinion is carried out. The experimental results are evaluated using four different classifiers such as Support vector machine, Naïve Bayes, Decision Tree, K-Nearest Neighborhood in terms of accuracy, precision, recall, and f-measure. On analyzing the social media related transit opinion data, it is observed that pre-processing with bigram technique performs better than the other approaches specifically with Support Vector Machine and Naïve Bayes.
Keywords: Feature Selection, Machine learning, Opinion Mining, Text Pre-processing, Twitter, Transit Opinion analysis, Social media.
Scope of the Article: Machine learning