只是为了回馈社区,感谢 Jordan 和 Gscott 的启发。我为 SQL Server/ Synapse 执行的解决方案是:
- 每日执行 INFORMATION_SCHEMA.TABLES 和 dbt 图中的模型计数作为一个表。
- 一个基于 1 的增量表,用于选择感兴趣的模式和聚合。在我下面的例子中,我过滤掉了分期和测试。
DbtModelCounts:
{% set models = [] -%}
{% if execute %}
{% for node in graph.nodes.values()
| selectattr("resource_type", "equalto", "model")
%}
{%- do models.append(node.name) -%}
{% endfor %}
{% endif %}
with tables AS
(
SELECT table_catalog [db], table_schema [schema_name], table_name [name], table_type [type]
FROM INFORMATION_SCHEMA.TABLES
),
dbt_tables AS
(
SELECT *
FROM tables
WHERE name in (
{%- for model in models %}
('{{ model}}')
{% if not loop.last %},
{% endif %}
{% endfor %}
)
)
SELECT
tables.db,
tables.schema_name,
tables.type,
COUNT(tables.name) ModelCount,
COUNT(dbt_tables.name) DbtModelCount
FROM tables
LEFT JOIN dbt_tables ON
tables.name=dbt_tables.name AND
tables.schema_name = dbt_tables.schema_name AND
tables.db = dbt_tables.db AND
tables.type = dbt_tables.type
GROUP BY
tables.db,
tables.schema_name,
tables.type
数据库覆盖率:
{{
config(
materialized='incremental',
unique_key='DateCreated'
)
}}
SELECT
CAST(GETDATE() AS DATE) AS DateCreated,
GETDATE() AS DateTimeCreatedUTC,
SUM(DbtModelCount) AS DbtModelCount,
SUM(ModelCount) AS TotalModels,
SUM(DbtModelCount)*100.0/SUM(ModelCount) as DbtCoveragePercentage
FROM {{ref('DbtModelCounts')}}
WHERE schema_name NOT LIKE 'testing%' AND schema_name NOT LIKE 'staging%'
为此,为已定义的源添加逻辑,以计算映射到我的暂存或原始模式表的源的百分比。