I'm trying to compare a remote md5sum result from some files into a server, with my local md5sum files and those that match, the hash and the filename should be removed from the local server.

The whole algorithm about get the md5sum from both is done, I have something like this:

remote_list="<hash values> <filename>.gz"
local_list="<hash values> <filename>.gz"

But now I need to do the comparison between what have into both lists. I was thinking in do two for's but I wonder if this is a good approach (and a efficient one).

So far I did this:


s3=`s3cmd ls --list-md5 s3://company-backup/company/"$datacenter"/"$hostname"/"$path"/`;
s3_list=$(echo "$s3" | tr -s ' ' | cut -d ' ' -f 4,5 | sed 's= .*/= =');
echo "$s3_list"

locally=`md5sum /"$path"/*.gz`;
echo "$locally";

locally_list=$(echo "$locally" | sed 's= .*/= =');
echo "$locally_list";

Which give me this output:

d41d8cd98f00b204e9800998ecf8427e #md5 from remote folder
41eae9b40d23de2f02bf07635870f6d0 app.20121117040001.gz #remote file
541b1bf78682f48867cc99dbb53c4c3a app.20121118040001.gz #remote file
31d90af7969f5003b27f68e27e7f2cb1 app.gz #remote file
31d90af7969f5003b27f68e27e7f2cb1  /backup/server245/app.gz #local file

So follow the idea, I have app.gz in both places, so I can delete it from my local my machine. Any idea or suggestions ?


如果仅当 md5sum 和文件名相同时才考虑匹配,那么很简单:

sort remote_list local_list | uniq -d > duplicate_list

(重要提示:这假设在任何一个文件列表中都没有重复。如果你正确地完成了 md5sums,肯定不应该有。)

