python - 选择性缓存/记忆的装饰器

Question

我正在寻找一种构建装饰器的方法@memoize，我可以在如下函数中使用它：

@memoize
my_function(a, b, c):
    # Do stuff 
    # result may not always be the same for fixed (a,b,c)
return result

然后，如果我这样做：

result1 = my_function(a=1,b=2,c=3)
# The function f runs (slow). We cache the result for later

result2 = my_function(a=1, b=2, c=3)
# The decorator reads the cache and returns the result (fast)

现在说我想强制缓存更新：

result3 = my_function(a=1, b=2, c=3, force_update=True)
# The function runs *again* for values a, b, and c. 

result4 = my_function(a=1, b=2, c=3)
# We read the cache

在上面的结尾，我们总是有result4 = result3，但不一定result4 = result，这就是为什么需要一个选项来强制相同输入参数的缓存更新。

我该如何解决这个问题？

注意事项`joblib`

据我所知joblib支持.call，这会强制重新运行，但不会更新缓存。

后续使用`klepto`：

有没有办法让klepto（见@Wally的回答）默认将其结果缓存在特定位置？（例如/some/path/）并跨多个功能共享此位置？例如我想说

cache_path = "/some/path/"

然后@memoize在同一路径下的给定模块中的几个函数。

score 4 · Accepted Answer

I would suggest looking at joblib and klepto. Both have very configurable caching algorithms, and may do what you want.

Both definitely can do the caching for result1 and result2, and klepto provides access to the cache, so one can pop a result from the local memory cache (without removing it from a stored archive, say in a database).

>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> hasher = hashmap(algorithm='md5')
>>> @memoize(keymap=hasher)
... def squared(x):
...   print("called")
...   return x**2
... 
>>> squared(1)
called
1
>>> squared(2)
called
4
>>> squared(3)
called
9
>>> squared(2)
4
>>> 
>>> cache = squared.__cache__()
>>> # delete the 'key' for x=2
>>> cache.pop(squared.key(2))
4
>>> squared(2)
called
4

Not exactly the keyword interface you were looking for, but it has the functionality you are looking for.

score 2 · Accepted Answer

你可以这样做：

import cPickle


def memoize(func):
    cache = {}

    def decorator(*args, **kwargs):
        force_update = kwargs.pop('force_update', None)
        key = cPickle.dumps((args, kwargs))
        if force_update or key not in cache:
            res = func(*args, **kwargs)
            cache[key] = res
        else:
            res = cache[key]
        return res
    return decorator

装饰器接受额外的参数force_update（你不需要在你的函数中声明它）。它从kwargs. 因此，如果您没有使用这些参数调用函数，或者您传递force_update = True的函数将被调用：

@memoize
def f(a=0, b=0, c=0):
    import random
    return [a, b, c, random.randint(1, 10)]


>>> print f(a=1, b=2, c=3)
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3, force_update=True)
[1, 2, 3, 2]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache as well
[1, 2, 3, 2]

score 1 · Accepted Answer

这纯粹是关于……的后续问题klepto。</p>

流动将扩展@Wally 的示例以指定一个目录：

>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> from klepto.archives import dir_archive
>>> hasher = hashmap(algorithm='md5')
>>> dir_cache = dir_archive('/tmp/some/path/squared')
>>> dir_cache2 = dir_archive('/tmp/some/path/tripled')
>>> @memoize(keymap=hasher, cache=dir_cache)
... def squared(x):
...   print("called")
...   return x**2
>>> 
>>> @memoize(keymap=hasher, cache=dir_cache2)
... def tripled(x):
...   print('called')
...   return 3*x
>>>

您可以交替使用 a file_archive，在其中将路径指定为：

cache = file_archive('/tmp/some/path/file.py')

score 1 · Accepted Answer

如果你想自己做：

def memoize(func):
    cache = {}
    def cacher(a, b, c, force_update=False):
        if force_update or (a, b, c) not in cache:
            cache[(a, b, c)] = func(a, b, c)
        return cache[(a, b, c)]
    return cacher

python - 选择性缓存/记忆的装饰器

注意事项joblib

后续使用klepto：

4 回答 4

Related

Reference

注意事项`joblib`

后续使用`klepto`：