memory - 内存 - 分页和 TLB

Question

我对以下任务有疑问。

考虑一个 IA-32 系统，其中 MMU 支持两级页表。第二级包含 1024 个页表条目，映射到 4 KB 页框。每个页表条目（两个级别）的大小为 4 个字节。系统仅支持 4 KB 页面大小。
我们想从虚拟内存中连续读取 8 MB，从字节 0 开始。我们一次读取一个字（4 个字节）
我们有一个 8 条目数据 TLB。读取上面指定的 8 MB 内存需要多少内存访问？

如果 TLB 有 4 个条目而不是 8 个条目，这会有所不同吗？

所以，我们按顺序阅读。这意味着 8MB/4B = 2M 内存访问。我们有一个两级页表。因此，2M + 2*2M = 6M 没有 TLB 的内存访问。

但我不知道如何计算包括 TLB 在内的内存访问。

谁能给我解释一下？那将非常有帮助。

score 1 · Accepted Answer

由于访问模式是流式访问，因此每个 TLB 条目将用于对整个页面的每四个字节的一次访问，并且永远不会重复使用。这意味着每个 TLB 条目将被重用 1023 次，因此每页将避免 1023 次查找（2046 次内存访问）。（由于不同翻译的使用没有重叠，只有完全本地化的重用，单个条目数据 TLB 将具有与 2048 个条目 TLB 相同的性能。）

考虑以下对双入口直接映射数据 TLB 发生的情况的描述（认识到虚拟地址的最低有效 12 位 - 页内的偏移量 - 对于 TLB 被忽略，并且虚拟地址的一位是用于索引到 TLB）：

load 0x0100_0000; // TLB entry 0 tag != 0x0800 (page # 0x0_1000) [miss]
                  // 2 memory accesses to fill TLB entry 0
load 0x0100_0004; // TLB entry 0 tag == 0x0800 [hit]
load 0x0100_0008; // TLB entry 0 tag == 0x0800 [hit]
...               // 1020 TLB hits in TLB entry 0
load 0x0100_0ffc; // TLB entry 0 tag == 0x0800 [hit]; last word in page
load 0x0100_1000; // TLB entry 1 tag != 0x0800 (page # 0x0_1001) [miss]
                  // 2 memory accesses to fill TLB entry 1
load 0x0100_1004; // TLB entry 1 tag == 0x0800 [hit]
load 0x0100_1008; // TLB entry 1 tag == 0x0800 [hit]
...               // 1020 TLB hits in TLB entry 1
load 0x0100_1ffc; // TLB entry 1 tag == 0x0800 [hit]; last word in page
load 0x0100_2000; // TLB entry 0 tag (0x0800) != 0x0801 (page # 0x0_1002) [miss]
                  // 2 memory accesses to fill TLB entry 0
load 0x0100_2004; // TLB entry 0 tag == 0x0801 [hit]
load 0x0100_2008; // TLB entry 0 tag == 0x0801 [hit]
...               // 1020 TLB hits in TLB entry 0
load 0x0100_2ffc; // TLB entry 0 tag == 0x0801 [hit]; last word in page
load 0x0100_3000; // TLB entry 1 tag (0x0800) != 0x0801 (page # 0x0_1003) [miss)
                  // 2 memory accesses to fill TLB entry 1
load 0x0100_3004; // TLB entry 1 tag  == 0x0801 [hit]
load 0x0100_3008; // TLB entry 1 tag  == 0x0801 [hit]
...               // 1020 TLB hits in TLB entry 1
load 0x0100_3ffc; // TLB entry 1 tag  == 0x0801 [hit]; last word in page
...               // repeat the above 510 times
                  // then the last 4 pages of the 8 MiB stream
load 0x017f_c000; // TLB entry 0 tag (0x0bfd) != 0x0bfe (page # 0x0_17fc) [miss]
                  // 2 memory accesses to fill TLB entry 0
load 0x017f_c004; // TLB entry 0 tag == 0x0bfe [hit]
load 0x017f_c008; // TLB entry 0 tag == 0x0bfe [hit]
...               // 1020 TLB hits in TLB entry 0
load 0x017f_cffc; // TLB entry 0 tag == 0x0bfe [hit]; last word in page
load 0x017f_d000; // TLB entry 1 tag (0x0bfd) != 0x0bfe (page # 0x0_17fd) [miss]
                  // 2 memory accesses to fill TLB entry 1
load 0x017f_d004; // TLB entry 1 tag == 0x0bfe [hit]
load 0x017f_d008; // TLB entry 1 tag == 0x0bfe [hit]
...               // 1020 TLB hits in TLB entry 1
load 0x017f_dffc; // TLB entry 1 tag == 0x0bfe [hit]; last word in page
load 0x017f_e000; // TLB entry 0 tag (0x0bfe) != 0x0bff (page # 0x0_17fe) [miss]
                  // 2 memory accesses to fill TLB entry 0
load 0x017f_e004; // TLB entry 0 tag == 0x0bff [hit]
load 0x017f_e008; // TLB entry 0 tag == 0x0bff [hit]
...               // 1020 TLB hits in TLB entry 0
load 0x017f_effc; // TLB entry 0 tag == 0x0bff [hit]; last word in page
load 0x017f_f000; // TLB entry 1 tag (0x0bfe) != 0x0bff (page # 0x0_17ff) [miss]
                  // 2 memory accesses to fill TLB entry 1
load 0x017f_f004; // TLB entry 1 tag  == 0x0bff [hit]
load 0x017f_f008; // TLB entry 1 tag  == 0x0bff [hit]
...               // 1020 TLB hits in TLB entry 1
load 0x017f_fffc; // TLB entry 1 tag  == 0x0bff [hit]; last word in page

每个页面按顺序被引用 1024 次（每四个字节元素一次），然后再也不会被引用。

（现在考虑一个设计有四个 TLB 条目和两个缓存页目录条目的条目[每个条目都有指向页表条目的页面的指针]。每个缓存的 PDE 将被重用于 1023 页查找，将它们减少到一个内存[如果 8 MiB 流式访问作为内部循环重复并且对齐 4 MiB，则在第一次迭代后将完全预热双入口 PDE 缓存，并且所有后续页表查找将只需要一个内存参考。]）

memory - 内存 - 分页和 TLB

1 回答 1

Related

Reference