5

I'll first try situate the problem a bit. We have a project that is build to a large tree of files. The build is several hundreds of MB, contains lots of (smallish) files, only a small fraction of which change between builds. We want to preserve a bit of history of these builds, and to do this efficiently we want to hardlink files that don't change between builds. For this we use rsync (as the more powerful brother of cp), from a local source to a local target with option --link-dest for doing the hardlinking magic.

This works fine for incremental builds: most files are not touched and rsync does the hardlink trick correctly. With full recompile builds (which we have to do for reasons that are not relevant here), things don't seem to work as expected. Because of the recompile, all files get a fresh timestamp, but content-wise, most files are still the same as the previous build. But even though we use rsync with the --checksum option (so rsync "syncs"/hardlinks based on content, not filesize+timestamp), nothing gets hardlinked anymore.

Illustration

I tried to isolate/illustrate the problem with this simple (bash) script:

echo "--- Start clean"
rm -fr src build*

echo "--- Set up src"
mkdir src
echo hello world > src/helloworld.txt

echo "--- First copy with src as hardlink reference"
rsync -a --checksum --link-dest=$(pwd)/src src/ build1/

echo "--- Second copy with first copy as hardlink reference"
rsync -a --checksum --link-dest=$(pwd)/build1 src/ build2/

echo "--- Result (as expected)"
ls -ali src/helloworld.txt build*/helloworld.txt

echo "--- Sleep to have reasonable timestamp differences"
sleep 2

echo "--- 'Remake' src, but with same content"
rm -fr src/helloworld.txt
echo hello world > src/helloworld.txt

echo "Third copy with second copy as hardlink reference"
rsync -a --checksum --link-dest=$(pwd)/build2 src/ build3
# Using --modify-window=10 gives results as expected
# rsync -a --modify-window=10 --link-dest=$(pwd)/build2 src/ build3

echo "Final result, not as expected"
ls -ali src/helloworld.txt build*/helloworld.txt

The first result is as expected: all three copies are hardlinked (same inode)

30157018 -rw-r--r--  3 stefaan  staff  12 May 10 01:28 build1/helloworld.txt
30157018 -rw-r--r--  3 stefaan  staff  12 May 10 01:28 build2/helloworld.txt
30157018 -rw-r--r--  3 stefaan  staff  12 May 10 01:28 src/helloworld.txt

The final result is not as expected/desired:

30157018 -rw-r--r--  2 stefaan  staff  12 May 10 01:28 build1/helloworld.txt
30157018 -rw-r--r--  2 stefaan  staff  12 May 10 01:28 build2/helloworld.txt
30157026 -rw-r--r--  1 stefaan  staff  12 May 10 01:28 build3/helloworld.txt
30157024 -rw-r--r--  1 stefaan  staff  12 May 10 01:28 src/helloworld.txt

The third copy build3/helloworld.txt is not hardlinked to the one from build2, even though the content is the same, so the checksum check should see this.

Question

Anybody has a idea what is wrong here? Is my expectation wrong? Or is rsync ignoring the --checksum option when syncing from local to local, for example because it knowns looking at inode numbers is smarter than spending time on checksums?

4

1 回答 1

3

问题是使用“-a”标志会强制保留修改时间(隐含的“-t”)。

如果您改用“-rlpgo”(或在“-a”后面加上“--no-times”),将不再考虑保留修改时间,因此将共享inode。您仍然必须指定“--size-only”或“--checksum”(后者显然更安全),这样它就不会根据文件时间进行比较。

文档没有明确区分哪些标志用于触发更新,哪些用于控制属性的保存

于 2012-05-28T03:56:07.560 回答