In this tutorial, I want to share my learning experience about web crawler using jsoup framework. For the beginning, I write a simple code to access some website and get all available link inside html.
Below my code:
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class Launcher { public static String url = "http://en.wikipedia.org/"; public static void main(String args[]) throws Exception { Document doc = Jsoup.connect(url).get(); Elements links = doc.select("a"); int count = 0; for (Element link : links) { if (count > 0) { if (link.attr("href").contains("http")) { System.out.println(link.attr("href")); try { Document tempDoc = Jsoup.connect(link.attr("href")).post(); System.out.println(tempDoc.toString()); }catch (Exception e) { // TODO: handle exception System.out.println(link.attr("href") + " : Error"); } } } count++; } } } |
CMIIW
Leave a Reply