There are a lot of posts looking for way to parse a url and get the hostname. The normal solution is to create a document element, set a url, and access the .hostname property. It's a great solution. I'm having trouble going a bit beyond this technique.
I have a function which successfully extracts the base host from a hostname. To describe what I mean by base host (not sure the correct nomenclature) I will show the function and give some example input outputs.
function parseURL(url) {
var parser = document.createElement('a');
parser.href = url;
url = parser.hostname;
//get a version of the url with the last "." and everything beyond it truncated.
//Uses this as a trick in the next step to get the "second to last" index.
url = url.substr(0, url.lastIndexOf("."));
//get a version of the url with everything before the second to last "." truncated.
url = parser.hostname.substr(url.lastIndexOf(".")+1);
return url;
};
parseURL("http://code.google.com/p/jsuri/")
//google.com - I don't think jsuri handle hosts any more effectively
parseURL("http://www.nytimes.com/pages/nyregion/index.html")
//nytimes.com
parseURL("http://fivethirtyeight.blogs.nytimes.com/2013/01/12/in-cooperstown-a-crowded-waiting-room/"
//nytimes.com
parseURL("http://www.guardian.co.uk/uk/2013/jan/13/fears-lulworth-cove-development-heritage"
//co.uk
The last example is the exception I fear, and why I'm looking for a more viable solution. The .hostname method for getting a host is a great first step, I just am looking for a better method of hacking off the sub-hosts that sometimes precede the base level host.
Any help appreciated (if only correcting my terminology).