Home Tutorials Training Consulting Books Company Contact us






Get more...

This tutorial explains the usage of Jsoup as a HTML parser.

1. jsoup

1.1. What is jsoup?

jsoup is a Java library for working with real-world HTML. It provides a convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

1.2. Using jsoup

The latest version of jsoup can be found via https://search.maven.org/artifact/org.jsoup/jsoup.

To use jsoup in a Maven build, add the following dependency to your pom.

<dependency>
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.13.1</version>
</dependency>

To use jsoup in your Gradle build, add the following dependency to your build.gradle file.

implementation 'org.jsoup:jsoup:1.13.1'

1.3. Example

The following code demonstrates how to read a webpage and how to extract its links.

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class ParseLinksExample {

  public static void main(String[] args) {

    Document doc;
    try {

        doc = Jsoup.connect("https://www.vogella.com/").get();

        // get title of the page
        String title = doc.title();
        System.out.println("Title: " + title);

        // get all links
        Elements links = doc.select("a[href]");
        for (Element link : links) {

            // get the value from href attribute
            System.out.println("\nLink : " + link.attr("href"));
            System.out.println("Text : " + link.text());
        }

    } catch (IOException e) {
        e.printStackTrace();
    }

  }

}

2. jsoup Resources