I'll first try situate the problem a bit. We have a project that is build to a large tree of files. The build is several hundreds of MB, contains lots of (smallish) files, only a small fraction of which change between builds. We want to preserve a bit of history of these builds, and to do this efficiently we want to hardlink files that don't change between builds. For this we use rsync
(as the more powerful brother of cp
), from a local source to a local target with option --link-dest
for doing the hardlinking magic.
This works fine for incremental builds: most files are not touched and rsync
does the hardlink trick correctly. With full recompile builds (which we have to do for reasons that are not relevant here), things don't seem to work as expected. Because of the recompile, all files get a fresh timestamp, but content-wise, most files are still the same as the previous build. But even though we use rsync
with the --checksum
option (so rsync
"syncs"/hardlinks based on content, not filesize+timestamp), nothing gets hardlinked anymore.
Illustration
I tried to isolate/illustrate the problem with this simple (bash) script:
echo "--- Start clean"
rm -fr src build*
echo "--- Set up src"
mkdir src
echo hello world > src/helloworld.txt
echo "--- First copy with src as hardlink reference"
rsync -a --checksum --link-dest=$(pwd)/src src/ build1/
echo "--- Second copy with first copy as hardlink reference"
rsync -a --checksum --link-dest=$(pwd)/build1 src/ build2/
echo "--- Result (as expected)"
ls -ali src/helloworld.txt build*/helloworld.txt
echo "--- Sleep to have reasonable timestamp differences"
sleep 2
echo "--- 'Remake' src, but with same content"
rm -fr src/helloworld.txt
echo hello world > src/helloworld.txt
echo "Third copy with second copy as hardlink reference"
rsync -a --checksum --link-dest=$(pwd)/build2 src/ build3
# Using --modify-window=10 gives results as expected
# rsync -a --modify-window=10 --link-dest=$(pwd)/build2 src/ build3
echo "Final result, not as expected"
ls -ali src/helloworld.txt build*/helloworld.txt
The first result is as expected: all three copies are hardlinked (same inode)
30157018 -rw-r--r-- 3 stefaan staff 12 May 10 01:28 build1/helloworld.txt
30157018 -rw-r--r-- 3 stefaan staff 12 May 10 01:28 build2/helloworld.txt
30157018 -rw-r--r-- 3 stefaan staff 12 May 10 01:28 src/helloworld.txt
The final result is not as expected/desired:
30157018 -rw-r--r-- 2 stefaan staff 12 May 10 01:28 build1/helloworld.txt
30157018 -rw-r--r-- 2 stefaan staff 12 May 10 01:28 build2/helloworld.txt
30157026 -rw-r--r-- 1 stefaan staff 12 May 10 01:28 build3/helloworld.txt
30157024 -rw-r--r-- 1 stefaan staff 12 May 10 01:28 src/helloworld.txt
The third copy build3/helloworld.txt
is not hardlinked to the one from build2
, even though the content is the same, so the checksum check should see this.
Question
Anybody has a idea what is wrong here? Is my expectation wrong? Or is rsync ignoring the --checksum
option when syncing from local to local, for example because it knowns looking at inode numbers is smarter than spending time on checksums?