Abstract:
At present, web crawler is a major component of the current search engine, also a hot point for the research on information retrieval. In this paper, the author reviews the research history and current situation about web crawler with certain intelligence, mainly in two aspects: first, the application of traditional artificial intelligence methods in web crawler, such as neural network, genetic algorithm, ant colony optimization and so on, as well as the focused crawler which is developed based on these methods; second, the agent technology about the coordination of the web crawler in multi-network crawling system. On this basis, a basic idea about web crawling based on semantic concept context graph is proposed.