Tuning Random Forest Parameters using Simulated Annealing for Intrusion Detection
Ambikavathi C1, S. K. Srivatsa2

1Ambikavathi*, Research Scholar, Faculty of S&H Computer Applications, Sathyabama University, Chennai, India.
2S. K. Srivatsa, Professor (Rtd.), MIT, Anna University, Chennai, India.
Manuscript received on June 21, 2020. | Revised Manuscript received on June 30, 2020. | Manuscript published on July 10, 2020. | PP: 353-358 | Volume-9 Issue-9, July 2020 | Retrieval Number: 100.1/ijitee.H6799069820 | DOI: 10.35940/ijitee.H6799.079920
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Intrusion detection system (IDS) security model is successfully utilized in static, distributed and dynamic network environments. Generally IDS needs a classification method for the decision of normal and abnormal events. This classification task is based on a set of features and massive amount of samples. However, all the features do not contribute the same level of prediction during classification. Hence feature selection (FS) has to be done before classification to select best features. Random forest (RF) does the dual role of FS and classification. Experiments have been done to prove that RF is the best classifier among other machine learning (ML) algorithms such as SVM classifier and C5.0 decision tree algorithm. However, the default parameter values of RF are not well suited for distributed environments such as cloud. It leads to poor accuracy and less efficiency in intrusion detection since enormous events have to be analyzed. So the parameters of RF have to be optimized by an efficient method. The important parameters of RF are number of trees, maximum depth of a tree, sample size, number of features considered to split a node (Mtry), node size and maximum leaf nodes. Among these parameters the hyper parameters are selected based on three decision factors, randomness; split rule; tree complexity. The issues to be considered during parameter tuning are to avoid over-fitting and under-fitting. Therefore Simulated Annealing (SA) is utilized for tuning these hyper parameters of RF which leads to improve detection accuracy and efficiency of IDS. The idea of using SA for parameter optimizing process is to avoid those issues since it never struck in local optimum. The proposed system significantly boosts the results of IDS. The efficiency of the proposed SA-RF is validated using CICIDS2017 dataset. 
Keywords: Cloud computing, Intrusion Detection System, Random Forest, Simulated Annealing.
Scope of the Article: Cloud computing