我在 BigQuery 中保存了数据,我对表运行 DBT 并在 BigQuery 中填充新表。我dbt run
每 5 分钟通过cron
. 我也dbt test
每小时运行一次,以检查数据一致性。
dbt run
总是成功运行,我从来没有遇到过任何问题。但是,有时在运行时dbt test
我偶尔会看到(可能每 20 次运行一次),出现以下错误:
Invalid snapshot time 1595674825279 for table xxx:dbt_stg.stg_journey_detail. Cannot read before 1595674825283
失败的测试始终是测试特定表上非空主键的测试。该表是一个中间表,如下所示:
with dedupe as (
select * EXCEPT(row_num) from (
select * from (
select *, row_number() over (
partition by id
order by updated_at desc
) as row_num
from {{ source('master', 'journey_detail') }}
)
) as journey_detail_dedupe
where journey_detail_dedupe.row_num = 1
)
select
* except (id, pausable),
id as journey_detail_id,
pausable as is_pausable
from dedupe
测试看起来像这样:
- name: stg_journey_detail
description: This model cleans up the journey detail table
columns:
- name: journey_detail_id
description: Primary key
tests:
- unique
- not_null
看起来 BigQuery 从来没有真正运行过查询,所以我怀疑这个错误是由 DBT 产生的。如果我在 BigQuery 日志中查看测试名称,我可以看到实际运行此查询的其他时间,这就是查询:
/* {"app": "dbt", "dbt_version": "0.17.0", "profile_name": "beryl", "target_name": "default", "node_id": "test.beryl.not_null_stg_journey_detail_journey_detail_id"} */
select count(*) as validation_errors
from `beryl-basis-development`.`dbt_stg`.`stg_journey_detail`
where journey_detail_id is null
非常感谢任何帮助!