我正在努力解决这个问题(好吧,tbh 在考试前一晚死记硬背:) 但我无法弄清楚(也无法在网上找到一个好的高级概述):
'页表条目可以映射到多个 TLB 条目。例如,如果每个页表条目都映射到两个 TLB 条目,这被称为 2-way set associative TLB'
我的问题是,为什么我们要不止一次地映射它?我们肯定希望在 TLB 中表示最大数量的可能条目,重复会浪费空间吗?我错过了什么?
非常感谢
我正在努力解决这个问题(好吧,tbh 在考试前一晚死记硬背:) 但我无法弄清楚(也无法在网上找到一个好的高级概述):
'页表条目可以映射到多个 TLB 条目。例如,如果每个页表条目都映射到两个 TLB 条目,这被称为 2-way set associative TLB'
我的问题是,为什么我们要不止一次地映射它?我们肯定希望在 TLB 中表示最大数量的可能条目,重复会浪费空间吗?我错过了什么?
非常感谢
It doesn't mean you would load the same entry into two places into the table -- it means a particular entry can be loaded to either of two places in the table. The alternative where you can only map an entry to one place in the table is a direct mapped TLB.
The primary disadvantage of a direct-mapped TLB arises if you're copying from one part of memory to another, and (by whatever direct-mapping scheme the CPU uses) the translations for both have to be mapped to the same spot in the TLB. In this case, you end up re-loading the TLB entry every time, so the TLB is doing little or no good at all. By having a two-way set associative TLB, you can guarantee that any two entries can be in the TLB at the same time so (for example) a block move from point A to point B can't ruin your day -- but if you read from two areas, combine them, and write results to a third it could (if all three used translations that map map to the same set of TLB entries).
The shortcoming of having a multiway TLB (like any other multiway cache) is that you can't directly compute which position might hold a particular entry at a given time -- you basically search across the ways to find the right entry. For two-way, that's rarely a problem -- but four ways is typically about the useful limit; 8-way set associative (TLBs | caches) aren't common at all, partly because searching across 8 possible locations for the data starts to become excessive.
Over time, the number of ways it makes sense to use in a cache or tlb tends to rise though. The differential in speed between memory and processors continues to rise. The greater the differential, the more cycles the CPU can use and still produce a result within a single memory clock cycle (or a specified number of memory clock cycles, even if that's more than one).