caching - Memcache 标签模拟

Question

Memcached是一个很好的可扩展缓存层，但它有一个大问题（对我来说）它无法管理标签。并且标签对于组失效非常有用。

我做了一些研究，我知道一些解决方案：

Memcache 标签分叉http://code.google.com/p/memcached-tag/
模拟标签的代码实现（参考。使用标准 php 库使多个 memcache 键无效的最佳方法？）

我最喜欢的解决方案之一是命名空间，这个解决方案在memcached wiki上进行了解释。

但是我不明白为什么我们在键缓存上集成命名空间？

根据我对命名空间技巧的理解：要生成密钥，我们必须获取命名空间的值（在缓存上）。如果namespace->value缓存条目被驱逐，我们将无法再计算获取缓存的好键......所以这个命名空间的缓存实际上是无效的（我说虚拟是因为缓存仍然存在但我们无法再计算要访问的键）。

那么为什么我们不能简单地实现类似的东西：

tag1->[key1, key2, key5]
tag2->[key1, key3, key6]
key1->["value" => value1, "tags" => [tag1, tag2]]
key2->["value" => value2, "tags" => [tag1]]
key3->["value" => value3, "tags" => [tag3]]
etc...

有了这个实现，我又遇到了一个问题，即如果tag1->[key1, key2, key5]被驱逐，我们就不能再使 tag1 键失效。但随着

function load($cacheId) {
   $cache = $memcache->get($cacheId);
   if (is_array($cache)) {
      $evicted = false;
      // Check is no tags have been evicted
      foreach ($cache["tags"] as $tagId) {
         if (!$memcache->get($tagId) {
            $evicted = true;
            break;
         }
      }
      // If no tags have been evicted we can return cache
      if (!$evicted) {
         return $cache
      } else {
         // Not mandatory
         $memcache->delete($cacheId);
      }
      // Else return false
      return false;
   }
}

是伪代码

如果所有这些标签都可用，我们一定会返回缓存。

我们可以说的第一件事是“每次您需要获取缓存时，我们都必须检查（/获取）X标签，然后检查数组”。但是对于命名空间，我们还必须检查（/get）命名空间来检索命名空间值，主要差异是在数组下迭代......但我认为键不会有很多标签（我无法想象超过 10 个标签/键对于我的应用程序），所以在大小为 10 的数组下迭代它的速度非常快..

所以我的问题是：有人已经考虑过这个实现了吗？限制是什么？我是不是忘记了什么？ETC

或者我可能误解了命名空间的概念......

PS：我不是在寻找像 memcached-tag 或 redis 这样的另一个缓存层

score 1 · Accepted Answer

I think you are forgetting something with this implementation, but it's trivial to fix.

Consider the problem of multiple keys sharing some tags:

key1 -> tag1 tag2
key2 -> tag1 tag2
tag1 -> key1 key2
tag2 -> key1 key2

Say you load key1. You double check both tag1 and tag2 exist. This is fine and the key loads.

Then tag1 is somehow evicted from the cache.

Your code then invalidates tag1. This should delete key1 and key2 but because tag1 has been evicted, this does not happen.

Then you add a new item key3. It also refers to tag1:

key3 -> tag1

When saving this key, tag1 is (re)created:

tag1 -> key3

Later, when loading key1 from cache again your check in the pseudo code to ensure tag1 exists succeeds. and the (stale) data from key1 is allowed to be loaded.

Obviously a way around this is to check the values of the tag1 data to ensure the key you are loading is listed in that array and only consider your key valid if this is true.

Of course this could have performance issues depending on your use case. If a given key has 10 tags, but each of those tags is used by 10k keys, then you are having to do search through an array of 10k items to find your key and repeat that 10 times each time you load something.

At some point, this may become inefficient.

An alternative implementation (and one which I use), is more appropriate when you have a very high read to write ratio.

If reads are very much the common case, then you could implement your tag capability in a more permanent database backend (I'll assume you have a db of sorts anyway so it only needs a couple extra tables here).

When you write an item in the cache, you store the key and the tag in a simple table (key and tag columns, one row for each tag on a key). Writing a key is simple: "delete from cache_tags where id=:key; foreach (tags as tag) insert into cache_tags values(:key, :tag); (NB use extended insert syntax in real impl).

When invalidating a tag, simply iterate over all keys that have that tag: (select key from cache_tags where tag=:tag;) and invalidate each of them (and optionally delete the key from the cache_tags table too to tidy up).

If a key is evicted from memcache then the cache_tags metadata will be out of date, but this is typically harmless. It will at most result in an inefficiency when invalidating a tag where you attempt to invalidate a key which had that tag but has already been evicted.

This approach gives "free" loading (no need to check tags) but expensive saving (which is already expensive anyway otherwise it wouldn't need to be cached in the first place!).

So depending on your use case and the expected load patterns and usage, I'd hope that either your original strategy (with more stringent checks on load) or a "database backed tag" strategy would fit your needs.

HTHs

caching - Memcache 标签模拟

1 回答 1

Related

Reference