Using Reduced Set of Features to Detect Spam in Twitter Data with Decision Tree and KNN Classifier Algorithms
K Subba Reddy1, E. Srinivasa Reddy2

1K Subba Reddy, Research Scholar, Anucet, ANU, Guntur, AP, India
2Srinivasa Reddy, Dean, Anucet, ANU, Guntur, AP, India
Manuscript received on 30 June 2019 | Revised Manuscript received on 05 July 2019 | Manuscript published on 30 July 2019 | PP: 06-12 | Volume-8 Issue-9, July 2019 | Retrieval Number: F3616048619/19©BEIESP | DOI: 10.35940/ijitee.F3616.078919

Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: In social media, the users share their ideas, opinions to their neighbours and friends. Spammers send spam information to the genuine users to mislead them. This spam data is a very serious problem in social media sites. To detect spam messages in social media various spam detection methodologies are developed by researchers. The researchers used more number of features to construct the models. Generally the original dataset contains many irrelevant and redundant features. Such large amount of features reduces the spam detection accuracy. To improve the spam detection accuracy in social media networks, we have to reduce the meaningless attributes from high dimensional social media dataset. In order to reduce dimensionality of dataset, we have used one of the dimensionality reduction approach, called principal component analysis (PCA). After reducing the dimensionality of the dataset, the dataset samples are classified using Decision Tree Induction classifier algorithm and K Nearest Neighbour algorithm. In our proposed work these algorithms are used to check data samples are spam samples or ham samples. In this methodology, we have used Twitter dataset for testing proposed approach. Experimental results shows that KNN classifier outperforms compared to Decision tree classifier.
Keywords: Social media, Dimensionality, PCA, Decision Tree, KNN algorithm.

Scope of the Article: Software Engineering Decision Support