1

我有CustomPandasDataset一个定制的期望

from great_expectations.data_asset import DataAsset
from great_expectations.dataset import PandasDataset
from datetime import date, datetime, timedelta

class CustomPandasDataset(PandasDataset):

    _data_asset_type = "CustomPandasDataset"
      
    @DataAsset.expectation(["column", "datetime_match", "datetime_diff"])
    def expect_column_max_value_to_match_datetime(self, column:str, datetime_match: datetime = None, datetime_diff: tuple = None) -> dict:
        """
        Check if data is constantly updated by matching the max datetime column to a
        datetime value or to a datetime difference.
        """
        max_datetime = self[column].max()

        if datetime_match is None:

            from datetime import date

            datetime_match = date.today()

        if datetime_diff:
            
            from datetime import timedelta

            success = (datetime_match - timedelta(*datetime_diff)) <= max_datetime <= datetime_match

        else:

            success = (max_datetime == datetime_match)

        result = {
            "data_max_value": max_datetime,
            "expected_max_value": str(datetime_match),
            "expected_datetime_diff": datetime_diff
        }

        return {
            "success": success,
            "result": result
        }

我想对expect_column_max_value_to_match_datetime给定的熊猫数据框运行期望

expectation_suite_name = "df-raw-expectations"

suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)

df_ge = ge.from_pandas(df, dataset_class=CustomPandasDataset)

batch_kwargs = {'dataset': df_ge, 'datasource': 'df_raw_datasource'}

# Get batch of data
batch = context.get_batch(batch_kwargs, suite)

我从 DataContext 得到的,现在当我对这批运行期望时

datetime_diff = 4,
batch.expect_column_max_value_to_match_datetime(column='DATE', datetime_diff=datetime_diff)

我收到以下错误

AttributeError: 'PandasDataset' object has no attribute 'expect_column_max_value_to_match_datetime'

根据文档,我dataset_class=CustomPandasDataset在构建 GreatExpectations 数据集时指定了该属性,确实在作品上运行了预期, df_ge 但不是在数据批次上运行。

4

1 回答 1

0

根据文档

要在数据源或 DataContext 中使用自定义期望,您需要在数据源配置或特定批次的 batch_kwargs 中定义自定义 DataAsset。

所以通过函数CustomPandasDatasetdata_asset_type参数get_batch()

# Get batch of data
batch = context.get_batch(batch_kwargs, suite, data_asset_type=CustomPandasDataset)

或者在上下文配置中定义

from great_expectations.data_context.types.base import DataContextConfig
from great_expectations.data_context import BaseDataContext

data_context_config = DataContextConfig(
    ...
    datasources={
        "sales_raw_datasource": {
            "data_asset_type": {
                "class_name": "CustomPandasDataset",
                "module_name": "custom_dataset",
            },
            "class_name": "PandasDatasource",
            "module_name": "great_expectations.datasource",
        }
    },
    ... 
    )
context = BaseDataContext(project_config=data_context_config)

模块/脚本在哪里CustomPandasDataset可用custom_dataset.py

于 2021-04-05T18:33:28.647 回答