***** KAIST Discete Math Semianr *****
DATE: March 16th, **Monday**
TIME: 2PM-3PM
PLACE: E6-1, ROOM 1409
SPEAKER: Kunsoo Park (박근수), Seoul National University
TITLE: An effective web crawling ordering from graph search techniques
A web crawler is a fundamental software which iteratively downloads web pages by following links of web pages starting from a small set of initial URLs. Previously several web crawling orderings have been studied. In this talk we consider various graph search techniques including lexicographic breadth-first search, lexicographic depth-first search and maximum cardinality search as well as well-known breadth-first search and depth-first search, and then find the best web crawling ordering among them. Furthermore, we treat internal links and external links in different manners for effective crawling. For maximum cardinality search and lexicographic breadth-first search, we present linear time algorithms based on the partition refinement method. Experimental results show that maximum cardinality search is the best graph search technique for web crawlers.