A Data Mining Based Malware Detection Model using Distinct API Call Sequences
Om Prakash Samantray1, Satya Narayan Tripathy2, Susanta Kumar Das3

1Om Prakash Samantray, PG, Department of Computer Science, Berhampur University, Berhampur, India.
2Satya Narayan Tripathy, PG, Department of Computer Science, Berhampur University, Berhampur, India.
3Susanta Kumar Das, PG, Department of Computer Science, Berhampur University, Berhampur, India.
Manuscript received on 05 May 2019 | Revised Manuscript received on 12 May 2019 | Manuscript published on 30 May 2019 | PP: 896-902 | Volume-8 Issue-7, May 2019 | Retrieval Number: F3533048619/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Malware is a serious threat from the last decade and the threat is increasing every year with the extensive use of internet. Rigorous researches have been going on to save our important information from being stolen and damaged by the malicious software. Despite many malware detection strategies, zero-day malware detection still is a challenge for the researchers. Here, we have presented a model which picks distinct API call sequences as feature and then uses data mining classification algorithms for malware detection. Distinct API call sequences are extracted from PE files which are supplied as input to different data mining or machine learning techniques. We have selected six robust data mining classifiers, namely Decision Tree (DT), Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), J48 and Random Forest (RF) to carry out the experiment. A comparison of their performance is also presented. 
Keyword: API Call Sequence, Data Mining, Malware analysis, Malware detection.
Scope of the Article: Data Mining