6

假设我使用 Joblib 设置memoization如下(使用此处提供的解决方案):

from tempfile import mkdtemp
cachedir = mkdtemp()

from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)

@memory.cache
def run_my_query(my_query)
    ...
    return df

假设我定义了几个查询,query_1并且query_2它们都需要很长时间才能运行。

我理解,代码原样:

  • 使用任一查询的第二次调用将使用记忆输出,即:

    run_my_query(query_1)
    run_my_query(query_1) # <- Uses cached output
    
    run_my_query(query_2)
    run_my_query(query_2) # <- Uses cached output   
    
  • 我可以memory.clear()用来删除整个缓存目录

但是,如果我只想对其中一个查询(例如)重新进行记忆,而不强制删除另一个查询,该怎么办?query_2

4

2 回答 2

4

该库似乎不支持缓存的部分擦除。

您可以将缓存、functino 分成两对:

from tempfile import mkdtemp
from joblib import Memory

memory1 = Memory(cachedir=mkdtemp(), verbose=0)
memory2 = Memory(cachedir=mkdtemp(), verbose=0)

@memory1.cache
def run_my_query1()
    # run query_1
    return df

@memory2.cache
def run_my_query2()
    # run query_2
    return df

现在,您可以有选择地清除缓存:

memory2.clear()

看到behzad.nouri的评论后更新:

您可以使用call装饰函数的方法。但正如您在下面的示例中看到的那样,返回值与正常调用不同。你应该照顾它。

>>> import tempfile
>>> import joblib
>>> memory = joblib.Memory(cachedir=tempfile.mkdtemp(), verbose=0)
>>> @memory.cache
... def run(x):
...     print('called with {}'.format(x))  # for debug
...     return x
...
>>> run(1)
called with 1
1
>>> run(2)
called with 2
2
>>> run(3)
called with 3
3
>>> run(2)  # Cached
2
>>> run.call(2)  # Force call of the original function
called with 2
(2, {'duration': 0.0011069774627685547, 'input_args': {'x': '2'}})
于 2014-09-23T15:03:58.043 回答
1

已经有几年了,但如果您的代码允许您重构为单独的函数,您可以轻松调用func.clear()以选择性地从缓存中删除该函数。

示例代码:

#!/usr/bin/env python

import sys
from shutil import rmtree

import joblib

cachedir = "joblib-cache"
memory = joblib.Memory(cachedir)


@memory.cache
def foo():
    print("running foo")
    return 42


@memory.cache
def oof():
    print("running oof")
    return 24


def main():
    rmtree(cachedir)

    print(f"{sys.version=}")
    print(f"{joblib.__version__=}")

    print(foo())
    print(oof())
    print()

    print("*" * 20 + " These should now be cached " + "*" * 20)
    print(foo())
    print(oof())
    print()

    foo.clear()
    print("*" * 20 + " `foo` should now be recaculated " + "*" * 20)
    print(foo())
    print(oof())


if __name__ == "__main__":
    main()

输出:

sys.version='3.9.6 (default, Jun 30 2021, 10:22:16) \n[GCC 11.1.0]'
joblib.__version__='1.0.1'
________________________________________________________________________________
[Memory] Calling __main__--tmp-tmp.DaQHHlsA2H-clearcache.foo...
foo()
running foo
______________________________________________________________foo - 0.0s, 0.0min
42
________________________________________________________________________________
[Memory] Calling __main__--tmp-tmp.DaQHHlsA2H-clearcache.oof...
oof()
running oof
______________________________________________________________oof - 0.0s, 0.0min
24

******************** These should now be cached ********************
42
24

WARNING:root:[MemorizedFunc(func=<function foo at 0x7f9cd7d8e040>, location=joblib-cache/joblib)]: Clearing function cache identified by __main__--tmp-tmp/DaQHHlsA2H-clearcache/foo
******************** `foo` should now be recaculated ********************
________________________________________________________________________________
[Memory] Calling __main__--tmp-tmp.DaQHHlsA2H-clearcache.foo...
foo()
running foo
______________________________________________________________foo - 0.0s, 0.0min
42
24
于 2021-08-31T14:45:08.650 回答