1

我想通过在之前和之后运行带有附加代码的原始方法来修补方法。特别是,我pyfakefs在内存文件系统中运行测试,但有时我想使用真实的文件系统,因为某些包无法在假文件系统上运行(pybedtools在我的情况下)。

可能有简单的方法可以做到这一点,但经过多次尝试后我无法弄清楚。这可能吗?

仅举个例子,下面我正在尝试修补to_csv熊猫。

import os
import tempfile
from unittest.mock import patch
import pandas as pd
from pyfakefs.fake_filesystem_unittest import Patcher


df_intervals = pd.DataFrame([
     ['1', 10, 20],
     ['20', 45, 55]],
     columns=['chrom', 'start', 'end'])


with Patcher(use_known_patches=True) as patcher:
    # As expecte writing to fake filesystem works
    fname = tempfile.NamedTemporaryFile()
    df_intervals.to_csv(fname.name)
    assert not os.path.exists(fname.name)
    assert patcher.fs.isfile(fname.name)

    # But, how do I patch `to_csv` to write to the real filesystem? My failed attempts:
    # Attempt 1
    # TypeError: super(type, obj): obj must be an instance or subtype of type
    class patched_DataFrame(pd.DataFrame):
        def to_csv(self, fname):
            print('Pausing fake file system')
            patcher.pause()
            super().to_csv(fname)
            print('Resuming fake file system')
            patcher.resume()

    with patch.object(pd.core.generic.NDFrame, 'to_csv', new=patched_DataFrame.to_csv):
        df_intervals.to_csv(fname.name)

    # Attempt 2: TypeError: 'patched_DataFrame' object is not callable
    with patch('pandas.core.frame.DataFrame', new_callable=patched_DataFrame):
        df_intervals.to_csv(fname.name)

    # Attempt 3: infinite recursion
    def patched_to_csv(self, fname):
        print('Pausing fake file system')
        patcher.pause()
        self.to_csv(fname)
        print('Resuming fake file system')
        patcher.resume()

    with patch.object(pd.core.generic.NDFrame, 'to_csv', new=patched_to_csv):
        df_intervals.to_csv(fname.name)
4

2 回答 2

1

一种(不是很优雅)的可能性是使用第三种方法并通过使用旧的保存to_csv方法来避免递归:

from pyfakefs.fake_filesystem_unittest import Patcher, Pause


with Patcher() as patcher:
    ...

    def patched_to_csv(self, fname):
        with Pause(patcher.fs):
            original_to_csv(self, fname)

    original_to_csv = pd.core.generic.NDFrame.to_csv
    with patch.object(pd.core.generic.NDFrame, 'to_csv', new=patched_to_csv):
        df_intervals.to_csv(fname.name)

请注意,我将上下文管理器用于暂停/恢复——这将允许在需要时轻松传播补丁函数的返回值,并且不易出错。
另请注意,这use_known_patchesTrue默认情况下。

免责声明
我是pyfakefs.

更新:我改变了答案,因为之前避免递归的尝试是错误的。

于 2021-06-19T20:47:35.197 回答
1

这是一种方法。

df_intervals = pd.DataFrame([
     ['1', 10, 20],
     ['20', 45, 55]],
     columns=['chrom', 'start', 'end'])


def fakefs_decorator(func, patcher):
    """ Force a method to work on the real filesystem """
    def fs_wrapper(*args, **kwargs):
        patcher.pause()
        out = func(*args, **kwargs)
        patcher.resume()
        return out

    if hasattr(func, '__self__'):
        def c_wrapper(_, *args, **kwargs):
            return fs_wrapper(*args, **kwargs)
        return classmethod(c_wrapper)
    return fs_wrapper


with Patcher(allow_root_user=False, use_known_patches=True) as patcher:
    fs_from_dataframe = fakefs_decorator(pybedtools.BedTool.from_dataframe, patcher)
    fs_to_dataframe = fakefs_decorator(pybedtools.BedTool.to_dataframe, patcher)
    fs_intersect = fakefs_decorator(pybedtools.BedTool.intersect, patcher)

    @patch('pybedtools.BedTool.from_dataframe', new=fs_from_dataframe)
    @patch('pybedtools.bedtool.BedTool.to_dataframe', new=fs_to_dataframe)
    @patch('pybedtools.bedtool.BedTool.intersect', new=fs_intersect)
    def test(df_intervals):
        bed_object = pybedtools.BedTool.from_dataframe(df_intervals)
        joined_bed_object = bed_object.intersect(bed_object)
        df = joined_bed_object.to_dataframe()
        return df

    df = test(df_intervals)
于 2021-06-20T06:32:12.260 回答