operating-system - 缓存中“块大小”的概念

Question

我刚刚开始学习直接映射和设置关联缓存的概念。我有一些非常基本的疑问。开始。

假设地址是 32 位长，我有一个 32KB 的缓存，64Byte 块大小和 512 帧，“块”内实际存储了多少数据？如果我有一条从内存位置的值加载的指令，并且该值是 16 位整数，那么 64 字节块之一现在是否仅存储 16 位（2 字节）整数值。块中的其他 62 个字节呢？如果我现在有另一个加载指令，它也加载一个 16 位整数值，这个值现在根据加载地址进入另一个帧的另一个块（如果地址映射到前一条指令的同一帧，那么前一个值被逐出并且该块再次仅在 64 个字节中存储 2 个字节）。正确的？

如果这似乎是一个非常愚蠢的疑问，请原谅我，这只是我想正确理解我的概念。

score 33 · Accepted Answer

I typed up this email for someone to explain caches, but I think you might find it useful as well.

You have 32-bit addresses that can refer to bytes in RAM. You want to be able to cache the data that you access, to use them later.

Let's say you want a 1-MiB (2²⁰ bytes) cache.

What do you do?

You have 2 restrictions you need to meet:

Caching should be as uniform as possible across all addresses. i.e. you don't want to bias toward any particular kind of address.

How do you do this? Use remainder! With mod, you can evenly distribute any integer over whatever range you want.

You want to help minimize bookkeeping costs. That means e.g. if you're caching in blocks of 1 byte, you don't want to store 4 bytes of data just to keep track of where 1 byte belongs to.

How do you do that? You store blocks that are bigger than just 1 byte.

Let's say you choose 16-byte (2⁴-byte) blocks. That means you can cache 2²⁰ / 2⁴ = 2¹⁶ = 65,536 blocks of data.

You now have a few options:

You can design the cache so that data from any memory block could be stored in any of the cache blocks. This would be called a fully-associative cache.

The benefit is that it's the "fairest" kind of cache: all blocks are treated completely equally.

The tradeoff is speed: To find where to put the memory block, you have to search every cache block for a free space. This is really slow.

You can design the cache so that data from any memory block could only be stored in a single cache block. This would be called a direct-mapped cache.

The benefit is that it's the fastest kind of cache: you do only 1 check to see if the item is in the cache or not.

The tradeoff is that, now, if you happen to have a bad memory access pattern, you can have 2 blocks kicking each other out successively, with unused blocks still remaining in the cache.

You can do a mixture of both: map a single memory block into multiple blocks. This is what real processors do -- they have N-way set associative caches.

Direct-mapped cache:

Now you have 65,536 blocks of data, each block being of 16 bytes.
You store it as 65,536 "rows" inside your cache, with each "row" consisting of the data itself, along with the metadata (regarding where the block belongs, whether it's valid, whether it's been written to, etc.).

Question: How does each block in memory get mapped to each block in the cache?

Answer: Well, you're using a direct-mapped cache, using mod. That means addresses 0 to 15 will be mapped to block 0 in the cache; 16-31 get mapped to block 2, etc... and it wraps around as you reach the 1-MiB mark.

So, given memory address M, how do you find the row number N? Easy: N = M % 2²⁰ / 2⁴.
But that only tells you where to store the data, not how to retrieve it. Once you've stored it, and try to access it again, you have to know which 1-MB portion of memory was stored here, right?

So that's one piece of metadata: the tag bits. If it's in row N, all you need to know is what the quotient was, during the mod operation. Which, for a 32-bit address, is 12 bits big (since the remainder is 20 bits).

So your tag becomes 12 bits long -- specifically, the topmost 12 bits of any memory address.
And you already knew that the lowermost 4 bits are used for the offset within a block (since memory is byte-addressed, and a block is 16 bytes).
That leaves 16 bits for the "index" bits of a memory address, which can be used to find which row the address belongs to. (It's just a division + remainder operation, but in binary.)

You also need other bits: e.g. you need to know whether a block is in fact valid or not, because when the CPU is turned on, it contains invalid data. So you add 1 bit of metadata: the Valid bit.

There's other bits you'll learn about, used for optimization, synchronization, etc... but these are the basic ones. :)

score 7 · Accepted Answer

I'm assuming you know the basics of tag, index, and offset but here's a short explanation as I have learned in my computer architecture class. Blocks are replaced in 64 byte blocks, so every time a new block is put into cache it replaces all 64 bytes regardless if you only need one byte. That's why when addressing the cache there is an offset that specifies the byte you want to get from the block. Take your example, if only 16 bit integer is being loaded, the cache will search for the block by the index, check the tag to make sure its the right data and then get the byte according to the offset. Now if you load another 16 bit value, lets say with the same index but different tag, it will replace the 64 byte block with the new block and get the info from the specified offset. (assuming direct mapped)

I hope this helps! If you need more info or this is still fuzzy let me know, I know a couple of good sites that do a good job of teaching this.

operating-system - 缓存中“块大小”的概念

2 回答 2

Related

Reference