Comprehensive Analysis of Web Page Classifier for Fsocused Crawler
Gourav Kumar Shrivastava1, Praveen Kaushik2, Rajesh Kumar Pateriya3

1Gourav Kumar Shrivastava, Department of CSE, Maulana Azad National Institute of technology, Bhopal, India.
2Praveen Kaushik, Department of CSE, Maulana Azad National Institute of technology, Bhopal, India.
3Rajesh Kumar Pateriya, Department of CSE, Maulana Azad National Institute of technology, Bhopal, India.

Manuscript received on 30 June 2019 | Revised Manuscript received on 05 July 2019 | Manuscript published on 30 July 2019 | PP: 57-65 | Volume-8 Issue-9, July 2019 | Retrieval Number: I7477078919/19©BEIESP | DOI: 10.35940/ijitee.I7477.078919
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Focused Crawler collects domain specific web page from the internet. However, the performance of focused web crawler depends upon the multidimensional nature of the web page. This paper presents a comprehensive analysis of recent web page classifiers for focused crawlers and also explores the impact of web-based feature in collaboration with web classifier. It also evaluates the performance of classification technique such as Support vector machine, Naive Bayes, Linear Regression and Random Forest over web page classification. Along with that it examines the impact of web feature i.e. anchor text, Page content and link over web page classification. Finally the paper yield interesting result about the collective response of web feature and classification technique for web page classification as a relevant class and irrelevant class.
Keywords: Focused Crawler, Feature Extraction Technique, Anchor text, Page Content, Link Priority, Naive Bayes, Linear Regression, Random Forest, SVM

Scope of the Article: Predictive Analysis