Text Preprocessing Method on Twitter Sentiment Analysis using Machine Learning
Jenifer Mahilraj1, Getahun Tigistu2, Sisay Tumsa3

1Jenifer Mahilraj*, Faculty of Computing and Software Engineering, Amit, Arbaminch University, Arbaminch, Ethiopia.
2Getahun Tigistu, Faculty of Computing and Software Engineering, Amit, Arbaminch University, Arbaminch, Ethiopia.
3Sisay Tumsa, Faculty of Computing and Software Engineering, Amit, Arbaminch University, Arbaminch, Ethiopia.
Manuscript received on August 15, 2020. | Revised Manuscript received on August 28, 2020. | Manuscript published on September 10, 2020. | PP: 233-240 | Volume-9 Issue-11, September 2020 | Retrieval Number: 100.1/ijitee.K77710991120 | DOI: 10.35940/ijitee.K7771.0991120
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: In real world, twitter sentimental analysis (TSA) acting a major role in observing the public opinion about customer side. TSA is complex compared to general sentiment analysis due to pre-processing of text on Twitter. The maximum limit on the number of characters allowed on Twitter is 280. In this article we discuss the influence of the text pre-processing technique on the classification efficiency of emotions in two kinds of classification problems and summarize the classification efficiency of the four pre-processing methods. This paper contributes to the consumer satisfaction classification sentiment analysis and is useful in evaluating the details in the context of the amount of tweets where views are somewhat unstructured and are either positive or negative, or somewhere in between. We first pre-processed the dataset, then extracted the adjective from the dataset with some meaning called the feature vector, then selected the feature vector list and subsequently applied machine learning based classification algorithms namely: Naive Bayes, Random Forest and SVM along with WordNet based Semantic Orientation which extracts synonyms and similarity for the features of content. Experiments display that the accuracy (Acc) and average F1-measure (F1-M) of the classification classifier on Twitter are enhanced by using methods of pre-processing the extension of acronyms and swapping negation, but barely deleting numbers or stop words. 
Keywords: Classification Efficiency, Data mining, Deep learning, Sentimental analysis.