Impact of Dataset Size and Performance Analysis of IDS using Random Forest Algorithm in ‘R’ Language
Amit Kumar Mishra1, Rakesh Chandra Bhadula2, Neha Garg3, Deepak Kholiya4, V. N. Kala5

1Amit Kumar Mishra, Department of Computer Science & Engineering, Graphic Era Hill, University, Dehradun, India.

2Rakesh Chandra Bhadula, Department of Mathematics, Graphic Era Hill, University, Dehradun, India.

3Neha Garg, Department of Computer Science & Engineering, Graphic Era, Deemed University, Dehradun, India.

4Deepak Kholiya, Department of Agriculture, Graphic Era Hill, University, Dehradun, India.

5V. N. Kala, Department of Applied Science, GBPC, Pauri Grahwal, India.

Manuscript received on 01 June 2019 | Revised Manuscript received on 07 June 2019 | Manuscript Published on 04 July 2020 | PP: 11-14 | Volume-8 Issue- 4S3 March 2019 | Retrieval Number: D10030384S319/2019©BEIESP | DOI: 10.35940/ijitee.D1003.0384S319

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (

Abstract: With the advancement of new technologies in today’s era, Big Data has shown tremendous growth and popularity. With this exaltation , Big data isn’t simply presenting challenge as far as volume yet in addition as far as its high speed generation. New data is fetched extremely fast so it becomes essential to deal with such voluminous data. Machine Learning expedites computers in building models from input data so as to automate decision-making processes. Machine learning algorithms such as ”Random Forest” is used with the help of certain datasets to instruct and train computers and also train them to respond like human beings. Selecting an appropriate dataset(size, parameters) plays an important role in providing efficient and effective result. In this paper, an analytical approach is used for IDS i.e. “Intrusion Detection System “where “ Random Forest algorithm” is used to analyze the training time by increasing the size of the dataset and observe the impact of frequent changes(size) on various evaluation metrics .Finally performance analysis is carried out and It is observed that the performance of IDS is better and more accurate.

Keywords: Intrusion Detection System, Data set, Evaluation Metrics, Machine Learning, Random Forest.
Scope of the Article: Machine Learning