When I try to get HTML from a URL in Groovy, I only get the static HTML. All dynamic content is (obviously) not loaded. Is there some way I can get the dynamically loaded content? I thought about extracting all the script urls from the static content, then extract ajax calls from those scripts and follow them, but my code will get messy really fast.
If you think this is not possible, then read on.
My motivation is to build a bookmarklet for an image indexer, not unlike Pinterest's bookmarklet. But I guess they faced the same issue of not being able to extract images loaded via ajax, and released a chrome extension. Can I somehow post the HTML that a user is currently seeing to my website? The same origin policy will not let me make an ajax call from the page the user is seeing to my own domain. And neither can I pass the HTML as a url parameter, due to url size limitations. Then I thought I would extract the image srcs and just pass those as a url parameter, but if the number of images is large, I will face the URL parameter size issue again. Is there an alternate way of doing this?
EDIT: If you think this is also not possible, then read further on.
I thought I would extract all the image urls one by one, and send them separately in a url with a random id parameter for each set of images. And when I send the last image, I can send a parameter to mention it is the last one, so my app knows it should not expect more. Something like this:
var images = document.getElementsByTagName("img");
var imageSetId = Math.random() * 9999;
var generatedSrc = "";
for (var i in images){
generatedSrc = "http://mydomain.myapp.com/extractor?src="+images[i].src+"&setId="+imageSetId;
if (i==images.length-1){
generatedSrc += "&last=true"
}
window.open(generatedSrc);
}
Each time the window opens, I can save the image's url, and based on the set ID, I can recreate the set. Once this is done I would close the window, unless I receive the last
parameter, and I can keep that window opened and show the user the extracted images.
But now the problem transforms into a UI problem. I do not want to show open close of the windows! Is there any way of skipping this?