I am trying to implement a defacement detector for websites. To achieve this, I should develop a tool in Java that compares similarity between two HTML files. I intend to strip URLs and JS to treat them seperately.
I am looking for a tool/ library /algorithm that I could use to calcuate a similarity metric (percentage ideally) in order to detect significant changes in websites.
Thank you for your help.