docker - How file lookup work in Docker container

Question

According to Docker docs, every Dockerfile instruction create a layer, and all the layers are kept when you create new image based on an old one. Then when I create my own image, I might have hundreds of layers involved because of the recursive inherit of layers of base image.

In my understand, file lookup in container work this way:

process want to access file a, lookup starts from the container layer(thin w/r layer) .
UnionFS check whether this layer have a record for it (have it or marked as deleted). If yes, return it or say not found respectively, ending the lookup. If no, pass the task to the layer below.
the lookup end at the bottom layer.

If that is the way, consider a file that resides in the bottom layer and unchanged by other layers, /bin/sh maybe, would need going through all the layers to the bottom. Though the layers might be very light-weight, a lookup still need 100x time than a regular one, noticeable. But from my experience, Docker is pretty fast, almost same as a native OS. Where am I wrong?

score 2 · Accepted Answer

为了增加正确的先前答案，写时复制 (CoW) 和联合文件系统实现者希望具有接近本机的性能，因此，当然，已经调整了他们的实现和“API”，以获得最佳的查找/文件系统性能。

也就是说，很高兴知道 Docker 不仅仅在单一“类型”的 union/CoW 文件系统之上运行，而是有一小部分可用选项，默认值取决于安装它的 Linux 发行版。

AUFS 和 overlay(fs) 是最常见的，但 Docker 也支持 devicemapper（Red Hat 在 Fedora/RHEL/CentOS 上贡献和支持）、btrfs 和 zfs。我有一篇博文比较和对比可能感兴趣的各种选项。

score 2 · Accepted Answer

这一切都归功于UnionFS和Union mounts！

直接来自维基百科：

它允许不同文件系统（称为分支）的文件和目录透明地覆盖，形成一个单一的连贯文件系统。

从一篇有趣的文章中：

在内核中，文件系统按照挂载顺序堆叠，第一个挂载的文件系统位于挂载栈的底部，最新的挂载位于栈顶。只有挂载堆栈顶部的文件和目录是可见的。使用联合挂载，来自下层文件系统的目录条目与上层文件系统的目录条目合并，从而形成所有已挂载文件系统的逻辑组合。在较低文件系统中具有相同名称的文件被屏蔽，因为较高文件系统具有优先权。

所以它不是传统意义上的“遍历层”（例如一次一个），而是它知道（在任何给定时间）哪个文件驻留在哪个磁盘上。

在文件系统层这样做也意味着任何软件都不必担心文件所在的位置，它知道请求/bin/sh并且文件系统知道从哪里获取文件。

更多信息可以在这个网络研讨会中找到。

所以回答你的问题：

我哪里错了？

您认为它必须一次查看一层，而不必这样做。（UnionFS 太棒了！）

docker - How file lookup work in Docker container

2 回答 2

所以回答你的问题：

Related

Reference