I'm navigating through a site with HtmlUnit. It has a table, with a list of document for download. I want to click all the links and gather all the documents (don't worry, the information is public and scraping is not forbidden).
The site is written with JSF, so the links to the documents are actually <a href="#"
with onclick
that submits the form (but sets a hidden field to the appropriate value before that).
My code is (in scala, but that doesn't matter):
val link = row.getFirstByXPath[HtmlElement](descriptor.documentLinkPath.get)
if (link.getAttribute("href").endsWith("#")) link.setAttribute("href", "javascript:void(0)")
val documentPage: Page = link.click()
val bytes = IOUtils.toByteArray(documentPage.getWebResponse().getContentAsStream())
There's a problem, however. The first document is downloaded properly. But I can't get the 2nd one and onwards - the html page is returned, rather than the PDF document. (commenting out the # -> javascript:void(0)
has no effect, I put it there because it used to blow up with some exception)
Javascript is enabled and getting it to work for the first document means that things are generally working. However, it doesn't work for the next documents. Any ideas how to resolve?