Using Classification Techniques to SMS Spam Filter
Halah Hadi Mansoor1, Shaimaa Hameed Shaker2
1Halah Hadi Mansoor1 , Informatics Institute for Postgraduate Studies, Iraqi Commission for Computers and Informatics, Baghdad, Iraq.
Dr. Shaimaa Hameed Shake, Asst. Prof. , Computer Science, University of Technology, Baghdad, Iraq.
Manuscript received on September 17, 2019. | Revised Manuscript received on 23 September, 2019. | Manuscript published on October 10, 2019. | PP: 1734-1739 | Volume-8 Issue-12, October 2019. | Retrieval Number: L32061081219/2019©BEIESP | DOI: 10.35940/ijitee.L3206.1081219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: SMS is service that uses mobile phone that allows the users to exchange textual content. Spamming can be defined as sending unwanted content to a group of people for various purposes such as fraud. SMS spam is one form of spamming in which unwanted messages are delivered to many clients by spammers. Therefore, it has become necessary to develop SMS spam detection system to keep up with the current development of message services. Where the aim of this work is developing spam filter for Arabic and English languages by using two filter to be able to detect spam sms efficiently. Content based method was used to build spam filter for English and Arabic languages. based on this method, there are a number of steps should be taken which are Read English and Arabic dataset, Preprocessing phase, Feature Extraction and Classification. The first step after reading the dataset for Arabic and English languages is preprocessing phase which is important step to get more accurate results. The next step is extracting the features from the body of each message. Eight features have been extracted from English messages and six features from Arabic messages. Then features of messages for English and Arabic languages are splitted into two set: training set and testing set. Training set are used to train the algorithms while the test set are used evaluate the performance of proposed Spam filter for the English and Arabic language. In proposed system two classifiers are used. Naive Bayes is used as first classifier and neural network as second classifier. The incoming messages are passed through naive Bayes classifier. If it is classified as ham then passes to second classifier to make sure if it is spam, otherwise it doesn’t passes to second classifier. The results of the proposed system were acceptable with 97% accuracy is obtained for English language when using eight features and 80% from dataset for training .And 95% accuracy is obtained for Arabic language with six features and 70% from dataset for training.
Keywords: SMS, Spam Filter, Naïve Bayes, Neural Network, Back Propagation, spam Detection.
Scope of the Article: Classification