Performance Examination and Feature Selection on Sybil User Data using Recursive Feature Elimination
Dheeraj Sonkhla1, Manu Sood2

1Dheeraj Sonkhla, Department of Computer Science, Himachal Pradesh University, Shimla, India.

2Manu Sood, Department of Computer Science, Himachal Pradesh University, Shimla, India

Manuscript received on 21 September 2019 | Revised Manuscript received on 30 September 2019 | Manuscript Published on 01 October 2019 | PP: 48-56 | Volume-8 Issue-9S4 July 2019 | Retrieval Number: I11080789S419/19©BEIESP | DOI: 10.35940/ijitee.I1108.0789S419

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Machine Learning (ML) research greatly helps in predicting model-based outcomes with high levels of accuracy based upon the training and testing of the models through the datasets. The social networks constitute one of the domains where ML can be used effectively to ensure the authenticity and security of the valid users. With the increase in usage of Online Social Networks (OSNs), the cases of spam and malicious activities can be found in abundance and Sybil nodes pose one such kind of safety and security hazard. Sybil account detection is not an easy task since they mimic the actual behavior of human accounts up to a great extent. In this paper, we look at one such scenario of Sybil accounts on the OSN, Twitter where machine leaning models have been used to train the machine with the existing datasets so as to be able to detect these malicious users before they can bring harm to the normal communication of the genuine users. Since the datasets used are so vast, the process of feature selection has been carried on the datasets as part of pre-processing before the actual classification as it assists in enhancing the model performance. Support Vector Machine–Recursive Feature Elimination (SVM-RFE) and Logistic Regression–Recursive Feature Elimination (LR-RFE) techniques have been used in this study for the selection of significant features. The classification model is trained on the selected features using Random Forest (RF) and K-Nearest Neighbor (KNN) algorithms. We also analyzed the biasing effects of fake accounts on the human accounts datasets during the process of features selection and classification. It has been shown that the RF algorithm outperformed KNN on the feature sets selected through SVM-RFE and LR-RFE.

Keywords: Feature Selection, K-Nearest Neighbor Classifier, Logistic Regression-Recursive Feature Elimination, Machine Learning.
Scope of the Article: Machine learning