我有CustomPandasDataset
一个定制的期望
from great_expectations.data_asset import DataAsset
from great_expectations.dataset import PandasDataset
from datetime import date, datetime, timedelta
class CustomPandasDataset(PandasDataset):
_data_asset_type = "CustomPandasDataset"
@DataAsset.expectation(["column", "datetime_match", "datetime_diff"])
def expect_column_max_value_to_match_datetime(self, column:str, datetime_match: datetime = None, datetime_diff: tuple = None) -> dict:
"""
Check if data is constantly updated by matching the max datetime column to a
datetime value or to a datetime difference.
"""
max_datetime = self[column].max()
if datetime_match is None:
from datetime import date
datetime_match = date.today()
if datetime_diff:
from datetime import timedelta
success = (datetime_match - timedelta(*datetime_diff)) <= max_datetime <= datetime_match
else:
success = (max_datetime == datetime_match)
result = {
"data_max_value": max_datetime,
"expected_max_value": str(datetime_match),
"expected_datetime_diff": datetime_diff
}
return {
"success": success,
"result": result
}
我想对expect_column_max_value_to_match_datetime
给定的熊猫数据框运行期望
expectation_suite_name = "df-raw-expectations"
suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)
df_ge = ge.from_pandas(df, dataset_class=CustomPandasDataset)
batch_kwargs = {'dataset': df_ge, 'datasource': 'df_raw_datasource'}
# Get batch of data
batch = context.get_batch(batch_kwargs, suite)
我从 DataContext 得到的,现在当我对这批运行期望时
datetime_diff = 4,
batch.expect_column_max_value_to_match_datetime(column='DATE', datetime_diff=datetime_diff)
我收到以下错误
AttributeError: 'PandasDataset' object has no attribute 'expect_column_max_value_to_match_datetime'
根据文档,我dataset_class=CustomPandasDataset
在构建 GreatExpectations 数据集时指定了该属性,确实在作品上运行了预期, df_ge
但不是在数据批次上运行。