git - Will git store diffs of binary files that change in content, but never change size?

Question

I am interested in storing an EEPROM HEX file of fixed size in git. The files will NEVER change size, but they will change content frequently.

If I add an EEPROM file to git and commit it, then I change a few bytes in the file, will git store this change efficiently over dozens or hundreds of commits?

In my research on this issue, I've run across some thorough discussions on the topic, but most of them seem to deal with files like PDFs and MP3s which nobody expects to stay the same or be comparable in a diff. I wonder if EEPROM HEX files would be treated differently since the file size stays the same?

EDITED (again)

Some initial observations... (Kudos to Krumelur for the "just try it" encouragement!)

The file that I am testing is a 7MB Intel HEX file. Based on the output from git, it appears to treat this file as a text file:

$ git commit -m "Changed a single byte."
[master bc2958b] Changed a single byte.
1 file changed, 1 insertion(+), 1 deletion(-)

The diff output matches as well:

$ git show bc2958b
commit bc2958b[...]
Author: ThoughtProcess <blah@blah.com>
Date:   Wed Jul 31 11:53:41 2013 -0500

    Changed a single byte.

diff --git a/test.hex b/test.hex
index fbdeed4..04d19b6 100644
--- a/test.hex
+++ b/test.hex
@@ -58,7 +58,7 @@
 :20470000000000000000000000000000000000000000000000000000E001EDD0D9310D00E4
 :20472000400200000080000000000000000000000000000000000000E002EDD0CF310D000B
 :20474000400200000080000000000000000000000000000000000000E0036D0063040D00D3
-:2047600040020000008000000000000000000000000000000000000000A0FF2F06801B0FF9
+:2047600040020000008000000000000000000000000000000000000000A0FF2G06801B0FF9
 :2047800000E01D007A00820F3CFB000000000000000000000000000000A0FF8F06801B1FEC
 :2047A00000E01D006A00821F3CFB000000000000000000000000000000A0FF6F06801B8F7C
 :2047C00000E01D005A00821F3CFB000000000000000000000000000000A0FF8F06801BDFFC

After 7 commits, the repository size is now 21MB. Here's the strange thing, I've noticed that the repository seems to grow by a roughly linear size (2MB) with each commit. Is that simply how git is designed to work? Or is it not storing the incremental differences as text like I'd expect?

score 5 · Accepted Answer

git 实际上是在某处存储文件的新完整副本，.git/objects因此您的存储库确实会线性增长。您可以运行git gc以使 git 打包存储库。对于您的数据，git 应该能够非常有效地打包，并且您的存储库应该变得更小。（git 也会git gc偶尔自动运行。）

score 1 · Accepted Answer

如果您真的要存储英特尔 HEX 格式文件，则无需担心 - 它们是文本文件。它们恰好代表二进制数据。

从维基百科条目：

格式是一个文本文件，每行包含编码数据序列的十六进制值及其起始偏移量或绝对地址。

编辑说明：您在测试中所做的更改无效 -G不是十六进制数字，除此之外，您没有更新校验和。

score 1 · Accepted Answer

我们可以测试 git 是否有效地存储了两个非常相似的二进制文件。在 git 版本 2.9.2.windows.1 上进行测试（为清楚起见，删除了额外的输出）：

$ git init
$ du -bs .git
15243   .git
$ head -c 10MB < /dev/urandom > random.bin
$ git add random.bin
$ git commit -m "Add random.bin"
$ du -bs .git
10018971        .git
$ git gc
$ du -bs .git
10020319        .git

Git 以大约 20 KB 的开销存储 10 MB 的二进制文件（请注意，原始文件仍占用目录中的另外 10 MB）。现在，如果我们使用文本编辑器修改文件几个字节（如果您愿意，可以在地址处写入字节（从命令行进行十六进制编辑/修改二进制文件））：

$ vim random.bin  # modify a few bytes
$ git add random.bin
$ git commit -m "Modify random.bin a little"
$ du -bs .git
20023953        .git
$ git gc
$ du -bs .git
10021228        .git

之前git gc，两个版本都是完全存储的。之后，git 非常高效地打包了这两个文件。Git 包文件在https://codewords.recurse.com/issues/three/unpacking-git-packfiles和https://git-scm.com/docs/pack-format中有更详细的描述

$ git verify-pack -v .git/objects/pack/pack-4bc29bb6848c64b94ba6074939c851b83240dd60.pack
4ea81b3f5d4f0ef5ddbc8e9adaac73b60c0899c4 commit 201 151 12
9e2bafb8cd3a4f0fc6d0773611a92ac1b14303b0 commit 141 111 163
f2aa8f26c4dcad0f73a03c958b2eb1c0fc6cb8fd blob   10000008 10003073 274
0b650d78653ec22c19453264384ed644fc956f42 tree   38 49 10003347
bd143b12cdec07b9aa68875052c01ae6d041f28f tree   38 49 10003396
fd1a966f4b0acc4c77ab85cb81841ebb0ee290ea blob   470 309 10003445 1 f2aa8f26c4dcad0f73a03c958b2eb1c0fc6cb8fd
non delta: 5 objects
chain length = 1: 1 object
.git/objects/pack/pack-4bc29bb6848c64b94ba6074939c851b83240dd60.pack: ok

最后一个 blob 是detified，它引用原始二进制文件的 SHA-1。

在这个答案中进行了类似的测试。

git - Will git store diffs of binary files that change in content, but never change size?

3 回答 3

Related

Reference