53

有没有办法将函数的输出记忆到磁盘?

我有一个功能

def getHtmlOfUrl(url):
    ... # expensive computation

并想做类似的事情:

def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")

然后调用 getHtmlMemoized(url),以便对每个 url 只进行一次昂贵的计算。

4

10 回答 10

46

Python offers a very elegant way to do this - decorators. Basically, a decorator is a function that wraps another function to provide additional functionality without changing the function source code. Your decorator can be written like this:

import json

def persist_to_file(file_name):

    def decorator(original_func):

        try:
            cache = json.load(open(file_name, 'r'))
        except (IOError, ValueError):
            cache = {}

        def new_func(param):
            if param not in cache:
                cache[param] = original_func(param)
                json.dump(cache, open(file_name, 'w'))
            return cache[param]

        return new_func

    return decorator

Once you've got that, 'decorate' the function using @-syntax and you're ready.

@persist_to_file('cache.dat')
def html_of_url(url):
    your function code...

Note that this decorator is intentionally simplified and may not work for every situation, for example, when the source function accepts or returns data that cannot be json-serialized.

More on decorators: How to make a chain of function decorators?

And here's how to make the decorator save the cache just once, at exit time:

import json, atexit

def persist_to_file(file_name):

    try:
        cache = json.load(open(file_name, 'r'))
    except (IOError, ValueError):
        cache = {}

    atexit.register(lambda: json.dump(cache, open(file_name, 'w')))

    def decorator(func):
        def new_func(param):
            if param not in cache:
                cache[param] = func(param)
            return cache[param]
        return new_func

    return decorator
于 2013-05-09T14:44:21.610 回答
37

退房joblib.Memory。这是一个可以做到这一点的图书馆。

from joblib import Memory
memory = Memory("cachedir")
@memory.cache
def f(x):
    print('Running f(%s)' % x)
    return x
于 2014-05-07T19:33:13.357 回答
19

由 Python 的Shelve 模块提供支持的更清洁的解决方案。优点是缓存通过众所周知的dict语法实时更新,而且它是异常证明(无需处理烦人的KeyError)。

import shelve
def shelve_it(file_name):
    d = shelve.open(file_name)

    def decorator(func):
        def new_func(param):
            if param not in d:
                d[param] = func(param)
            return d[param]

        return new_func

    return decorator

@shelve_it('cache.shelve')
def expensive_funcion(param):
    pass

这将有助于函数只计算一次。接下来的后续调用将返回存储的结果。

于 2017-11-20T06:07:39.313 回答
17

还有diskcache.

from diskcache import Cache

cache = Cache("cachedir")

@cache.memoize()
def f(x, y):
    print('Running f({}, {})'.format(x, y))
    return x, y
于 2019-11-20T06:23:07.837 回答
5

Artemis 库为此提供了一个模块。(你需要pip install artemis-ml

你装饰你的功能:

from artemis.fileman.disk_memoize import memoize_to_disk

@memoize_to_disk
def fcn(a, b, c = None):
    results = ...
    return results

在内部,它对输入参数进行散列,并通过该散列保存备忘录文件。

于 2015-06-29T12:33:07.470 回答
0

这样的事情应该做:

import json

class Memoize(object):
    def __init__(self, func):
        self.func = func
        self.memo = {}

    def load_memo(filename):
        with open(filename) as f:
            self.memo.update(json.load(f))

    def save_memo(filename):
        with open(filename, 'w') as f:
            json.dump(self.memo, f)

    def __call__(self, *args):
        if not args in self.memo:
            self.memo[args] = self.func(*args)
        return self.memo[args]

基本用法:

your_mem_func = Memoize(your_func)
your_mem_func.load_memo('yourdata.json')
#  do your stuff with your_mem_func

如果您想在使用后将“缓存”写入文件 - 以便将来再次加载:

your_mem_func.save_memo('yournewdata.json')
于 2013-05-09T14:12:51.280 回答
0

假设您的数据是 json 可序列化的,则此代码应该可以工作

import os, json

def json_file(fname):
    def decorator(function):
        def wrapper(*args, **kwargs):
            if os.path.isfile(fname):
                with open(fname, 'r') as f:
                    ret = json.load(f)
            else:
                with open(fname, 'w') as f:
                    ret = function(*args, **kwargs)
                    json.dump(ret, f)
            return ret
        return wrapper
    return decorator

装饰getHtmlOfUrl然后简单地调用它,如果它之前已经运行过,你会得到你的缓存数据。

使用 python 2.x 和 python 3.x 检查

于 2017-03-06T12:02:57.337 回答
0

大多数答案都是以装饰者的方式。但也许我不想每次调用函数时都缓存结果。

我使用上下文管理器制作了一个解决方案,因此该函数可以称为

with DiskCacher('cache_id', myfunc) as myfunc2:
    res=myfunc2(...)

当您需要缓存功能时。

'cache_id' 字符串用于区分名为[calling_script]_[cache_id].dat. 因此,如果您在循环中执行此操作,则需要将循环变量合并到 thiscache_id中,否则数据将被覆盖。

或者:

myfunc2=DiskCacher('cache_id')(myfunc)
res=myfunc2(...)

或者(这可能不是很有用,因为始终使用相同的 id):

@DiskCacher('cache_id')
def myfunc(*args):
    ...

带有示例的完整代码(我pickle用于保存/加载,但可以更改为任何保存/读取方法。请注意,这也是假设所讨论的函数仅返回 1 个返回值):

from __future__ import print_function
import sys, os
import functools

def formFilename(folder, varid):
    '''Compose abspath for cache file

    Args:
        folder (str): cache folder path.
        varid (str): variable id to form file name and used as variable id.
    Returns:
        abpath (str): abspath for cache file, which is using the <folder>
            as folder. The file name is the format:
                [script_file]_[varid].dat
    '''
    script_file=os.path.splitext(sys.argv[0])[0]
    name='[%s]_[%s].nc' %(script_file, varid)
    abpath=os.path.join(folder, name)

    return abpath


def readCache(folder, varid, verbose=True):
    '''Read cached data

    Args:
        folder (str): cache folder path.
        varid (str): variable id.
    Keyword Args:
        verbose (bool): whether to print some text info.
    Returns:
        results (tuple): a tuple containing data read in from cached file(s).
    '''
    import pickle
    abpath_in=formFilename(folder, varid)
    if os.path.exists(abpath_in):
        if verbose:
            print('\n# <readCache>: Read in variable', varid,
                    'from disk cache:\n', abpath_in)
        with open(abpath_in, 'rb') as fin:
            results=pickle.load(fin)

    return results


def writeCache(results, folder, varid, verbose=True):
    '''Write data to disk cache

    Args:
        results (tuple): a tuple containing data read to cache.
        folder (str): cache folder path.
        varid (str): variable id.
    Keyword Args:
        verbose (bool): whether to print some text info.
    '''
    import pickle
    abpath_out=formFilename(folder, varid)
    if verbose:
        print('\n# <writeCache>: Saving output to:\n',abpath_out)
    with open(abpath_out, 'wb') as fout:
        pickle.dump(results, fout)

    return


class DiskCacher(object):
    def __init__(self, varid, func=None, folder=None, overwrite=False,
            verbose=True):
        '''Disk cache context manager

        Args:
            varid (str): string id used to save cache.
                function <func> is assumed to return only 1 return value.
        Keyword Args:
            func (callable): function object whose return values are to be
                cached.
            folder (str or None): cache folder path. If None, use a default.
            overwrite (bool): whether to force a new computation or not.
            verbose (bool): whether to print some text info.
        '''

        if folder is None:
            self.folder='/tmp/cache/'
        else:
            self.folder=folder

        self.func=func
        self.varid=varid
        self.overwrite=overwrite
        self.verbose=verbose

    def __enter__(self):
        if self.func is None:
            raise Exception("Need to provide a callable function to __init__() when used as context manager.")

        return _Cache2Disk(self.func, self.varid, self.folder,
                self.overwrite, self.verbose)

    def __exit__(self, type, value, traceback):
        return

    def __call__(self, func=None):
        _func=func or self.func
        return _Cache2Disk(_func, self.varid, self.folder, self.overwrite,
                self.verbose)



def _Cache2Disk(func, varid, folder, overwrite, verbose):
    '''Inner decorator function

    Args:
        func (callable): function object whose return values are to be
            cached.
        varid (str): variable id.
        folder (str): cache folder path.
        overwrite (bool): whether to force a new computation or not.
        verbose (bool): whether to print some text info.
    Returns:
        decorated function: if cache exists, the function is <readCache>
            which will read cached data from disk. If needs to recompute,
            the function is wrapped that the return values are saved to disk
            before returning.
    '''

    def decorator_func(func):
        abpath_in=formFilename(folder, varid)

        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            if os.path.exists(abpath_in) and not overwrite:
                results=readCache(folder, varid, verbose)
            else:
                results=func(*args, **kwargs)
                if not os.path.exists(folder):
                    os.makedirs(folder)
                writeCache(results, folder, varid, verbose)
            return results
        return wrapper

    return decorator_func(func)



if __name__=='__main__':

    data=range(10)  # dummy data

    #--------------Use as context manager--------------
    def func1(data, n):
        '''dummy function'''
        results=[i*n for i in data]
        return results

    print('\n### Context manager, 1st time call')
    with DiskCacher('context_mananger', func1) as func1b:
        res=func1b(data, 10)
        print('res =', res)

    print('\n### Context manager, 2nd time call')
    with DiskCacher('context_mananger', func1) as func1b:
        res=func1b(data, 10)
        print('res =', res)

    print('\n### Context manager, 3rd time call with overwrite=True')
    with DiskCacher('context_mananger', func1, overwrite=True) as func1b:
        res=func1b(data, 10)
        print('res =', res)

    #--------------Return a new function--------------
    def func2(data, n):
        results=[i*n for i in data]
        return results

    print('\n### Wrap a new function, 1st time call')
    func2b=DiskCacher('new_func')(func2)
    res=func2b(data, 10)
    print('res =', res)

    print('\n### Wrap a new function, 2nd time call')
    res=func2b(data, 10)
    print('res =', res)

    #----Decorate a function using the syntax sugar----
    @DiskCacher('pie_dec')
    def func3(data, n):
        results=[i*n for i in data]
        return results

    print('\n### pie decorator, 1st time call')
    res=func3(data, 10)
    print('res =', res)

    print('\n### pie decorator, 2nd time call.')
    res=func3(data, 10)
    print('res =', res)

输出:

### Context manager, 1st time call

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Context manager, 2nd time call

# <readCache>: Read in variable context_mananger from disk cache:
 /tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Context manager, 3rd time call with overwrite=True

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Wrap a new function, 1st time call

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Wrap a new function, 2nd time call

# <readCache>: Read in variable new_func from disk cache:
 /tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### pie decorator, 1st time call

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### pie decorator, 2nd time call.

# <readCache>: Read in variable pie_dec from disk cache:
 /tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
于 2020-07-17T07:54:12.650 回答
0

您可以使用 cache_to_disk 包:

    from cache_to_disk import cache_to_disk

    @cache_to_disk(3)
    def my_func(a, b, c, d=None):
        results = ...
        return results

这会将结果缓存 3 天,具体到参数 a、b、c 和 d。结果存储在您机器上的 pickle 文件中,并在下次调用该函数时取消腌制并返回。3天后,pickle文件被删除,直到函数重新运行。每当使用新参数调用该函数时,该函数将重新运行。更多信息:https ://github.com/sarenehan/cache_to_disk

于 2018-07-09T17:20:25.553 回答
0

查看Cachier。它支持额外的缓存配置参数,如 TTL 等。

简单的例子:

from cachier import cachier
import datetime

@cachier(stale_after=datetime.timedelta(days=3))
def foo(arg1, arg2):
  """foo now has a persistent cache, trigerring recalculation for values stored more than 3 days."""
  return {'arg1': arg1, 'arg2': arg2}
于 2022-01-25T09:03:53.703 回答