I am trying to write my own Nutch plugin for crawling webpages. The problem is that I need to identify if there is some special tag, e.g. on the webpage. There is some note in official documentation that this is possible using Document.getElementsByTagName("foo") but this is not working for me. Do you have any idea?
My second question is that if I identified tag above, I would like to get some other tags from this webpage where tag was identified... is there any way to store complete source code of the webpage which is crawled at some moment?
Thanks, Jan.