1

From my understanding, the jsoup parser allows to work with relative links, as long as a base URI is specified when instanciating the parser.

Let's assume now that the document defines a base tag with an URI which is different from the URI of the page. This URI cannot be known a priori.

What is the behaviour of the parser? Does it detect automatically the tag and apply it to the whole document? Or do we have to parse the document a first time to detect the base tag, and then re-parse it with the detected value as base uri?

Thanks in advance Koj

4

1 回答 1

1

Jsoup will detect the <base> tag and apply it throughout the document. When resolving relative links, a base-tag URI takes precedence over the URI supplied to the parser. You don't need to parse the document twice. As an example:

Document doc = Jsoup.parse(
    "<a href='/one/'>One</a>" +
    "<base href='http://example.com/' />" +
    "<a href='/two/'>Two</a>");
Elements els = doc.select("a");
for (Element e: els) {
    System.out.println(e.attr("abs:href"));
}

gives:

http://example.com/one/
http://example.com/two/
于 2012-12-11T16:48:03.490 回答