JAVA Web Crawler Using Jsoup

In this tutorial, I want to share my learning experience about web crawler using jsoup framework. For the beginning, I write a simple code to access some website and get all available link inside html.
Below my code:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
 
public class Launcher {
 
	public static String url = "http://en.wikipedia.org/"; 
 
	public static void main(String args[]) throws Exception {
		Document doc = Jsoup.connect(url).get();
		Elements links = doc.select("a");
		int count = 0;
		for (Element link : links) {			
			if (count > 0) {
				if (link.attr("href").contains("http")) {
					System.out.println(link.attr("href"));
					try {
						Document tempDoc = Jsoup.connect(link.attr("href")).post();
						System.out.println(tempDoc.toString());
					}catch (Exception e) {
						// TODO: handle exception
						System.out.println(link.attr("href") + " : Error");
					}												
				}
			}
			count++;
		}
	}
}

CMIIW :)

Leave a Reply

Your email address will not be published. Required fields are marked *

Afiseaza emoticoanele Locco.Ro