This tutorial explains the usage of Jsoup as a HTML parser.
1. jsoup
1.1. What is jsoup?
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
1.2. Using jsoup
The latest version of jsoup can be found via https://search.maven.org/artifact/org.jsoup/jsoup.
To use jsoup in a Maven build, add the following dependency to your pom.
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
To use jsoup in your Gradle build, add the following dependency to your build.gradle file.
implementation 'org.jsoup:jsoup:1.13.1'
1.3. Example
The following code demonstrates how to read a webpage and how to extract its links.
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class ParseLinksExample {
public static void main(String[] args) {
Document doc;
try {
doc = Jsoup.connect("https://www.vogella.com/").get();
// get title of the page
String title = doc.title();
System.out.println("Title: " + title);
// get all links
Elements links = doc.select("a[href]");
for (Element link : links) {
// get the value from href attribute
System.out.println("\nLink : " + link.attr("href"));
System.out.println("Text : " + link.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
2. jsoup Resources
Nothing listed.
2.1. vogella Java example code
If you need more assistance we offer Online Training and Onsite training as well as consulting