Multimodal Decision-level Group Sentiment Prediction of Students in Classrooms
Archana Sharma1, Vibhakar Mansotra2

1Dr. Archana Sharma, Department of Computer Science, Government M.A.M College, Cluster University of Jammu, Jammu, India.
2Dr. Vibhakar Mansotra, Department of Computer Science and IT, University of Jammu, Jammu, India.

Manuscript received on September 15, 2019. | Revised Manuscript received on 24 September, 2019. | Manuscript published on October 10, 2019. | PP: 4902-4909 | Volume-8 Issue-12, October 2019. | Retrieval Number: L35491081219/2019©BEIESP | DOI: 10.35940/ijitee.L3549.1081219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (

Abstract: Sentiment analysis can be used to study an individual or a group’s emotions and attitudes towards other people and entities like products, services, or social events. With the advancements in the field of deep learning, the enormity of available information on internet, chiefly on social media, combined with powerful computing machines, it’s just a matter of time before artificial intelligence (AI) systems make their presence in every aspect of human life, making our lives more introspective. In this paper, we propose to implement a multimodal sentiment prediction system that can analyze the emotions predicted from different modal sources such as video, audio and text and integrate them to recognize the group emotions of the students in a classroom. Our experimental setup involves a digital video camera with microphones to capture the live video and audio feeds of the students during a lecture. The students are advised to provide their digital feedback on the lecture as ‘tweets’ on their twitter account addressed to the lecturer’s official twitter account. The audio and video frames are separated from the live streaming video using tools such as lame and ffmpeg. A twitter API was used to access and extract messages from twitter platform. The audio and video features are extracted using Mel-Frequency Cepstral Co-efficients (MFCC) and Haar Cascades classifier respectively. The extracted features are then passed to the Convolutional Neural Network (CNN) model trained on the FER2013 facial images database to generate the feature vector for classification of video-based emotions. A Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM), trained on speech emotion corpus database was used to train on the audio features. A lexicon-based approach with senti-word dictionary and learning based approach with custom dataset trained by Support Vector Machines (SVM) was used in the twitter-texts based approach. A decision-level fusion algorithm was applied on these three different modal schemes to integrate the classification results and deduce the overall group emotions of the students. The use-case of this proposed system will be in student emotion recognition, employee performance feedback, monitoring or surveillance-based systems. The implemented system framework was tested in a classroom environment during a live lecture and the predicted emotions demonstrated the classification accuracy of our approach.
Keywords: Multimodal, Sentiments, Deep Learning, Convolutional Neural Networks, Recurrent Neural Networks, Support Vector Machines, Classrooms
Scope of the Article: Deep Learning