jsoup - Does Jsoup use the base tag of an html document to automatically resolve relative path

Question

From my understanding, the jsoup parser allows to work with relative links, as long as a base URI is specified when instanciating the parser.

Let's assume now that the document defines a base tag with an URI which is different from the URI of the page. This URI cannot be known a priori.

What is the behaviour of the parser? Does it detect automatically the tag and apply it to the whole document? Or do we have to parse the document a first time to detect the base tag, and then re-parse it with the detected value as base uri?

Thanks in advance Koj

score 1 · Accepted Answer

Jsoup will detect the <base> tag and apply it throughout the document. When resolving relative links, a base-tag URI takes precedence over the URI supplied to the parser. You don't need to parse the document twice. As an example:

Document doc = Jsoup.parse(
    "<a href='/one/'>One</a>" +
    "<base href='http://example.com/' />" +
    "<a href='/two/'>Two</a>");
Elements els = doc.select("a");
for (Element e: els) {
    System.out.println(e.attr("abs:href"));
}

gives:

http://example.com/one/
http://example.com/two/

jsoup - Does Jsoup use the base tag of an html document to automatically resolve relative path

1 回答 1

Related

Reference