我想知道“git merge”背后的确切算法(或接近该算法)。至少对这些子问题的答案会有所帮助:
- git 如何检测特定非冲突更改的上下文?
- git 如何发现这些确切的行中存在冲突?
- git 会自动合并哪些东西?
- 当没有合并分支的共同基础时,git如何执行?
- 当合并分支有多个公共基础时,git如何执行?
- 当我一次合并多个分支时会发生什么?
- 合并策略有什么区别?
但是对整个算法的描述会好很多。
您最好寻找 3 路合并算法的描述。高层次的描述会是这样的:
B
- 文件的一个版本,它是两个新版本 (X
和Y
) 的祖先,通常是最近的此类基础(尽管在某些情况下它必须更进一步,这是git
s 默认recursive
合并的特点)X
withB
和Y
with的差异B
。完整的算法更详细地处理了这个问题,甚至还有一些文档(https://github.com/git/git/blob/master/Documentation/technical/trivial-merge.txt一个,以及git help XXX
页面,其中 XXX 是merge-base
、merge-file
、merge
和merge-one-file
可能的其他几个之一)。如果这还不够深,总有源代码......
当合并分支有多个公共基础时,git如何执行?
这篇文章很有帮助:http ://codicesoftware.blogspot.com/2011/09/merge-recursive-strategy.html (这里是第 2 部分)。
Recursive 递归地使用 diff3 来生成一个虚拟分支,该分支将用作祖先。
例如:
(A)----(B)----(C)-----(F)
| | |
| | +---+
| | |
| +-------+
| | |
| +---+ |
| | |
+-----(D)-----(E)
然后:
git checkout E
git merge F
有 2 个最好的共同祖先(不是任何其他祖先的共同祖先),C
和D
. Git 将它们合并为一个新的虚拟分支V
,然后V
用作基础。
(A)----(B)----(C)--------(F)
| | |
| | +---+
| | |
| +----------+
| | | |
| +--(V) | |
| | | |
| +---+ | |
| | | |
| +------+ |
| | |
+-----(D)--------(E)
我想如果有更多最好的共同祖先,Git 会继续下去,V
与下一个合并。
文章说,如果在生成虚拟分支时发生合并冲突,Git 只会将冲突标记留在原处并继续。
当我一次合并多个分支时会发生什么?
正如@Nevik Rehnel 解释的那样,这取决于策略,在man git-merge
MERGE STRATEGIES
章节中有很好的解释。
仅octopus
和ours
/theirs
支持一次合并多个分支,recursive
例如不支持。
octopus
如果存在冲突,则拒绝合并,并且ours
是微不足道的合并,因此不会发生冲突。
这些命令生成一个新的提交将有两个以上的父级。
我merge -X octopus
在 Git 1.8.5 上做了一个,没有冲突,看看它是怎么回事。
初始状态:
+--B
|
A--+--C
|
+--D
行动:
git checkout B
git merge -Xoctopus C D
新状态:
+--B--+
| |
A--+--C--+--E
| |
+--D--+
正如所料,E
有3个父母。
TODO:章鱼如何对单个文件修改进行操作。递归二乘二三路合并?
当没有合并分支的共同基础时,git如何执行?
@Torek 提到,从 2.9 开始,没有--allow-unrelated-histories
.
我在 Git 1.8.5 上根据经验进行了尝试:
git init
printf 'a\nc\n' > a
git add .
git commit -m a
git checkout --orphan b
printf 'a\nb\nc\n' > a
git add .
git commit -m b
git merge master
a
包含:
a
<<<<<<< ours
b
=======
>>>>>>> theirs
c
然后:
git checkout --conflict=diff3 -- .
a
包含:
<<<<<<< ours
a
b
c
||||||| base
=======
a
c
>>>>>>> theirs
解释:
a\nc\n
作为单行添加来解决我也很感兴趣。我不知道答案,但是...
一个有效的复杂系统总是被发现是从一个有效的简单系统演变而来的
我认为 git 的合并非常复杂,而且很难理解——但解决这个问题的一种方法是从它的前身出发,并专注于你关注的核心。也就是说,给定两个没有共同祖先的文件, git merge 如何计算出如何合并它们,以及冲突在哪里?
让我们试着找到一些前兆。来自git help merge-file
:
git merge-file is designed to be a minimal clone of RCS merge; that is,
it implements all of RCS merge's functionality which is needed by
git(1).
来自维基百科:http ://en.wikipedia.org/wiki/Git_%28software%29 -> http://en.wikipedia.org/wiki/Three-way_merge#Three-way_merge -> http://en.wikipedia .org/wiki/Diff3 -> http://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf
diff3
最后一个链接是详细描述该算法的论文的 pdf 。这是一个google pdf-viewer 版本。它只有 12 页长,算法只有几页——但它是一个全面的数学处理。这可能看起来有点过于正式,但如果你想了解 git 的合并,你需要先了解更简单的版本。我还没有检查过,但是对于类似 的名称diff3
,您可能还需要了解 diff(它使用最长的公共子序列算法)。diff3
但是,如果您有谷歌,可能会有更直观的解释......
现在,我只是做了一个比较diff3
和的实验git merge-file
。它们采用相同的三个输入文件version1 oldversion version2并以相同的方式标记冲突,用<<<<<<< version1
, =======
, >>>>>>> version2
(diff3
也有||||||| oldversion
),显示它们的共同遗产。
我为oldversion使用了一个空文件,为version1和version2使用了几乎相同的文件,只在version2中添加了一行。
结果:git merge-file
将单个更改的行识别为冲突;但diff3
将整个两个文件视为冲突。因此,与 diff3 一样复杂,git 的合并更加复杂,即使对于这种最简单的情况也是如此。
这是实际结果(我使用了@twalberg 对文本的回答)。请注意所需的选项(请参阅相应的联机帮助页)。
$ git merge-file -p fun1.txt fun0.txt fun2.txt
You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:
Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B. Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.
<<<<<<< fun1.txt
=======
THIS IS A BIT DIFFERENT
>>>>>>> fun2.txt
The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...
$ diff3 -m fun1.txt fun0.txt fun2.txt
<<<<<<< fun1.txt
You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:
Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B. Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.
The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...
||||||| fun0.txt
=======
You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:
Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B. Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.
THIS IS A BIT DIFFERENT
The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...
>>>>>>> fun2.txt
如果你真的对此感兴趣,那就有点像兔子洞了。对我来说,它似乎和正则表达式、diff 的最长公共子序列算法、上下文无关文法或关系代数一样深。如果你想深入了解它,我认为你可以,但这需要一些坚定的研究。
git 如何检测特定非冲突更改的上下文?
git 如何发现这些确切的行中存在冲突?
如果同一行在合并的两边都发生了变化,那就是冲突;如果他们没有,则接受来自一侧的更改(如果存在)。
git 会自动合并哪些东西?
不冲突的更改(见上文)
当合并分支有多个公共基础时,git如何执行?
根据Git merge-base的定义,只有一个(最新的共同祖先)。
当我一次合并多个分支时会发生什么?
这取决于合并策略(只有octopus
和ours
/theirs
策略支持合并两个以上的分支)。
合并策略有什么区别?
这在git merge
手册页中有解释。
这是原始实现
http://git.kaarsemaker.net/git/blob/857f26d2f41e16170e48076758d974820af685ff/git-merge-recursive.py
基本上,您为两个提交创建一个共同祖先列表,然后递归地合并它们,或者快速转发它们,或者创建用于文件三向合并基础的虚拟提交。