1

I googled and couldn't find any could that would compare a webpage to a previous version.

In this case the page I'm trying to watch is link text. There are services that can watch a page, but I'd like to set this up on my own server.

I've set this up as a wiki so anyone can add to the code. Here's my idea

  1. Check if previous version of file exists. If false then download page
  2. If page exists, diff to find differences and email the new content along with dates of new and old versions.

This script would be called nightly via cron or on-demand via the browser (the latter is not a priority)

Sounds simple, maybe I'm just not looking in the right place.

4

2 回答 2

3

也许像这样一个简单的 sh 脚本,具有 wget、diff 和测试?

#!/bin/sh

WWWURI="http://foo.bar/testfile.html"
LOCALCOPY="testfile.html"
TMPFILE="tmpfile"
WEBFILE="changed.html"

MAILADDRESS="$(whoami)"
SUBJECT_NEWFILE="$LOCALCOPY is new"
BODY_NEWFILE="first version of $LOCALCOPY loaded"
SUBJECT_CHANGEDFILE="$LOCALCOPY updated"
SUBJECT_NOTCHANGED="$LOCALCOPY not updated"
BODY_CHANGEDFILE="new version of $LOCALCOPY"

# test for old file
if [ -e "$LOCALCOPY" ]
then
    mv "$LOCALCOPY" "$LOCALCOPY.bak"
    wget "$WWWURI" -O"$LOCALCOPY" -o/dev/null
    diff "$LOCALCOPY" "$LOCALCOPY.bak" > $TMPFILE

# test for update
    if [ -s "$TMPFILE" ]
    then
        echo "$SUBJECT_CHANGEDFILE"
        ( echo "$BODY_CHANGEDFILE" ; cat "$TMPFILE" ) | tee "$WEBFILE" | mail -s "$SUBJECT_CHANGEDFILE" "$MAILADDRESS"
    else
        echo "$SUBJECT_NOTCHANGED"
    fi
else
    wget "$WWWURI" -O"$LOCALCOPY" -o/dev/null
    echo "$BODY_NEWFILE"
    echo "$BODY_NEWFILE" | tee "$WEBFILE" | mail -s "$SUBJECT_NEWFILE" "$MAILADDRESS"
fi
[ -e "$TMPFILE" ] && rm "$TMPFILE"

更新:通过 tee、小拼写和删除 $TMPFILE

于 2009-09-29T20:55:34.700 回答
0

You can check This SO posting to get a few ideas and also information about the challenge of detecting "true" changes to a web page (with fluctuating advertisement block, and other "noise")

于 2009-09-29T19:44:23.860 回答