I am trying to set up a bash script to download a web page once a day, then run a diff of the last two pages and send an alert if the pages are more than 15% different. I'm not really sure how to approach the selection of the two most recent pages.
The script starts simple enough, just doing a wget of a page and inserting the date into the filename:
wget --output-document=index`date +%Y-%m-%d`.html https://www.example.com
Assuming a couple of those pages have been collected, we run a diff of the two most recent pages. (And this is where I'm lost)
sdiff -B -b -s index1.html index2.html | wc -l
Any suggestions on how to set this up so it can pull the last two files and run the diff?