Disquisition of Sentiment Inquiry with Hashing and Counting Vectorizer using Machine Learning Classification
Kota Venkateswara Rao1, M. Shyamala Devi2
1Kota Venkateswara Rao*, Research Scholar, Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai, Tamil Nadu, India.
2M. Shyamala Devi, Associate Professor, Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai, Tamil Nadu, India.
Manuscript received on October 12, 2019. | Revised Manuscript received on 21 October, 2019. | Manuscript published on November 10, 2019. | PP: 737-743 | Volume-9 Issue-1, November 2019. | Retrieval Number: A4220119119/2019©BEIESP | DOI: 10.35940/ijitee.A4220.119119
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: With the rapid growth in technology, analysis of feedback and reviews by the customers in companies and industries becomes a major challenge. The profit of the company mainly depends on the customer satisfaction. The view of the customer can be analyzed only through feedback. The review analysis can be utilized for the prediction of current sales and future sales of the company. With this overview, the paper aims in performing the sentiment analysis of the movie review. The Type of comment given by the customer is predicted and categorized into classes. The sentiment Analysis on movie Review dataset taken from the KAGGLE leading Dataset repository is used for implementation. The categorization of sentiment classes is achieved in five categories. Firstly, the target count for each sentiment is portrayed. The Resampling is done for equalizing the target sentiment count. Secondly, the extraction of sentiment feature words for each target is displayed and the data cleaning is done with Term Frequency Inverse document Frequency method. Thirdly, the resampled dataset is then fitted with the various classifiers like Multinomial Naives Bayes Classifier, Logistic Regression Classifier, KNearest Neighbors Classifier, Bernoulli Naives Bayes Classifier, Complement Naives Bayes Classifier, Nearest Centroid Classifer, Passive Aggressive Classifier, SGD Classifier, Ridge Classifier, Perceptron Classifier. Fourth, the feature extraction is done with Hashing Vectorizer and Counting Vectorizer. The vocabulary features are also displayed from the dataset. Fifth, the Performance analysis of clasifier is done with metrics like Accuracy, Recall, FScore and Precision. The implementation is carried out using python code in Spyder Anaconda Navigator IP Console. Experimental results shows that the sentiment prediction and classification done by Ridge classifier is found to be effective with Precision of 0.89, Recall of 0.88, FScore of 0.87 and Accuracy of 89%.
Keywords: Accuracy, Recall, F Score, Sentiment and Precision
Scope of the Article: Classification