Automation of Manual Seed URLs Cull Approach for Web Crawlers
Suvarna Sharma1, Amit Bhagat2
1Suvarna Sharma, Department of Mathematics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal (M.P), India.
2Amit Bhagat, Department of Mathematics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal (M.P), India.
Manuscript received on 05 February 2019 | Revised Manuscript received on 13 February 2019 | Manuscript published on 28 February 2019 | PP: 57-63 | Volume-8 Issue-4, February 2019 | Retrieval Number: D2626028419/19©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Web mining has become a more emerging topic these days and is speedily increasing with the growth of data on web. It is playing an essential role in our life as it helps us providing quicker information by using new trends and technologies to improve. Hyperlink structure analysis and web crawling provide scope for more advanced research topics. If a system coverers various most relevant web pages in search engine environment, then it can improve the result of search engine. This URL’s set may be useful for extracting more relevant information or improving on existing and may also be useful to manage crawling infrastructure to offer quicker responses. Today, web crawling is an emerging issue in search engine which considers search quality, accessing pages at various servers to extract features. In the current scenario, the user may only be interested in the best result with some specific constraints. The constraint may define to the domain of search or importance of relevant pages. Here, we consider important or useful pages for particular user in searching environment. We proposed a framework, namely BUDG (Base URL’s Set for Directed Graph) which deals with URL’s hyperlink structure and generates a min set of ‘K’ URLs and then discover the covered graph for directed graph. Experimental results show that the proposed framework is working properly for different domain.
Keyword: Information Retrieval, Seed Urls, Web Crawler, Web Graph Analysis, Web Mining.
Scope of the Article: Web Mining