End-to-End Machine Learning Pipeline for Real-Time Network Traffic Classification and Monitoring in Android Automotive
Sriram M1, A Susmithaa Raam2, B Vignesh3, Balasubramanian V4

1Sriram M, UG Student, Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Chennai (Tamil Nadu), India. 
2Susmithaa Raam A, UG Student, Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Chennai (Tamil Nadu), India.
3Vignesh B, UG Student, Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Chennai (Tamil Nadu), India. 
4Dr. Balasubramanian V*, Associate Professor, Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Chennai (Tamil Nadu), India.
Manuscript received on 23 May 2022. | Revised Manuscript received on 30 May 2022. | Manuscript published on 30 June 2022. | PP: 32-38 | Volume-11 Issue-7, June 2022. | Retrieval Number: 100.1/ijitee.G99820611722 | DOI: 10.35940/ijitee.G9982.0611722
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The aim of this work is to build a network traffic monitoring application that is capable of categorizing network data traffic based on their application usage into 7 types: Browsing, Chat, Email, File Transfer, Streaming, VoIP and P2P. Flow-wise data is analyzed after the traffic stream is fed into the CICFlowmeter. Live traffic flow is fed to various ML models and algorithms such as K-Means Clustering algorithm, Agglomerative Clustering, Mean-shift algorithm, Random Forest Classifier, Adaptive Boosting algorithm, Gradient Boosting algorithm, Linear Discriminant analysis, Naive Bayes classifier, Classification and regression trees and the Support Vector Machine model. K-fold cross validation test is conducted, which derived results depicting the best of the models to be the Random Forest Classifier. We used 23 features for model training based on their importances. Model evaluation is done using the confusion matrix. Class imbalances are handled effectively with a comparative study of both under-sampling and oversampling of the dataset. Oversampling using SMOTE produces better results. The important timebased features in classification is recorded for further studies. The model used was fast enough to classify the flows in real time and display the analytics in the dashboard. The Flask framework is used to build a live dashboard to display the network traffic classified along with the several important features. We were able to prove that network traffic classification cam be done using time-based features which does not violate data protection laws. Network traffic classification using Random forest algorithm on oversampled dataset gave an overall accuracy of 0.92 was achieved. 
Keywords: Machine Learning, Android Automotive, CICFlowmeter, Network Flow Classifier
Scope of the Article: Machine Learning