bash - Running a diff from the two most recent versions of a page

Question

I am trying to set up a bash script to download a web page once a day, then run a diff of the last two pages and send an alert if the pages are more than 15% different. I'm not really sure how to approach the selection of the two most recent pages.

The script starts simple enough, just doing a wget of a page and inserting the date into the filename:

wget --output-document=index`date +%Y-%m-%d`.html https://www.example.com

Assuming a couple of those pages have been collected, we run a diff of the two most recent pages. (And this is where I'm lost)

sdiff -B -b -s index1.html index2.html | wc -l

Any suggestions on how to set this up so it can pull the last two files and run the diff?

score 0 · Accepted Answer

当您执行 wget 时，我会将日期保留为文件名的一部分。

对于文件比较，我会通过以下解决方案。

YdayFile=index`date +%Y%m%d -d "1 day ago"`.html
TodaysFile=index`date +%Y%m%d`.html        
wget --output-document=${TodaysFile} https://www.example.com
sdiff -B -b -s ${TodaysFile} ${YdayFile} | wc -l

您可以将“1 天前”替换为您想要返回的任意天数。在 diff 之前进行文件存在检查也很好。

查看此链接了解更多日期操作。http://ss64.com/

bash - Running a diff from the two most recent versions of a page

1 回答 1

Related

Reference