I'm trying to build a Web Crawler in Java, and I'm wondering if there is any way I can get the relative path from an absolute path given the base url. I'm trying to replace any absolute paths in the html under the same domain.
As the http urls contains unsafe characters, I was not able to use Java URI as described in How to construct a relative path in Java from two absolute paths (or URLs)?.
I'm using jsoup to parse my html and it seems that it is able to get absolute path from relative, but not the other way round.
E.g. In a particular html of the following html,
"http://www.example.com/mysite/base.html"
In the page source of base.html, it can contains:
'<a href="http://www.example.com/myanothersite/new.html"> Another site of mine </a>
I am trying to cache this base.html, and edit it such that it now contains:
'<a href="../myanothersite/new.html">Another site of mine</a>