1

我正在尝试使用“great_expectations”库访问雪花数据源。

以下是我到目前为止所尝试的:

from ruamel import yaml

import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest

context = ge.get_context()



datasource_config = {
    "name": "my_snowflake_datasource",
    "class_name": "Datasource",
    "execution_engine": {
        "class_name": "SqlAlchemyExecutionEngine",
        "connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
        "default_inferred_data_connector_name": {
            "class_name": "InferredAssetSqlDataConnector",
            "include_schema_name": True,
        },
    },
}

print(context.test_yaml_config(yaml.dump(datasource_config)))

在执行上述代码之前,我启动了 great_expectation:

great_expectations init

但我收到以下错误:

great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_snowflake_datasource, error: 'NoneType' object has no attribute 'create_engine'

我究竟做错了什么?

4

1 回答 1

0

你的配置好像没问题,对应这里的例子。

如果您查看回溯,您应该注意到错误从great_expectations/execution_engine/sqlalchemy_execution_engine.py虚拟环境中的文件开始传播。

发生错误的实际行是:

            self.engine = sa.create_engine(connection_string, **kwargs)

如果您sa在该文件的顶部搜索它:

import sqlalchemy as sa

make_url = import_make_url()
except ImportError:
        sa = None

所以没有安装 sqlalchemy,如果你安装了greate_expectations,你不会在你的环境中自动获得它。要做的是安装 snowflake-sqlalchemy,因为您想使用 sqlalchemy 的雪花插件(基于您的 connection_string 的假设)。

/your/virtualenv/bin/python -m pip install snowflake-sqlalchemy

之后您应该不再收到错误,看起来test_yaml_config正在等待连接超时。

令我非常担心的是记录在案的使用已弃用的ruamel.yaml. 该功能ruamel.yaml.dump近期将被移除,您应该使用实例的.dump()方法。ruamel.yaml.YAML()

您应该改用以下代码:

import sys
from ruamel.yaml import YAML

import great_expectations as ge
context = ge.get_context()

datasource_config = {
    "name": "my_snowflake_datasource",
    "class_name": "Datasource",
    "execution_engine": {
        "class_name": "SqlAlchemyExecutionEngine",
        "connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
        "default_inferred_data_connector_name": {
            "class_name": "InferredAssetSqlDataConnector",
            "include_schema_name": True,
        },
    },
}

yaml = YAML()

yaml.dump(datasource_config, sys.stdout, transform=context.test_yaml_config)

我会做一个 PR 来更新他们的文档/使用ruamel.yaml.

于 2022-02-09T08:06:48.200 回答