2

Is there a way to use python based hooks with dbt?

I'm using seed data to create dynamic models with jinja but am looking for a bit more python flexibility than with what's available with jinja natively. As a comparison, something along the lines of the way Django views can inject variables into templates.

I'm new to dbt and perhaps approaching this wrong. Thanks to anyone with help or advice.

Here's an example where I was looking to use python zip and ended up using similar sql logic. I have a similar need to use python's enumerate. Should I just use sql over python for these types of scenario? I suppose most if not all of this can be achieved with sql (i just happen to be more familiar with python than sql when it comes to this type manipulation).

Current working example using sql:

{% set mappings = dbt_utils.get_query_results_as_dict("select 
      CONCAT(my_field, ' AS ', my_alias) AS my_pairs FROM " ~  
      ref('data_seed_schema1_to_schema2') ) %}

SELECT
    {% for map in mappings %}
        {{',\n\t\t'.join(mappings[map]) }}
    {% endfor %}
FROM my_table

-->

SELECT  
    fooA AS barA,  
    fooB AS barb  
FROM my_table

desired python example:

{% set mappings = dbt_utils.get_query_results_as_dict("select * FROM " ~ 
      ref('data_seed_schema1_to_schema2') ) %}

# my_zip = [f"{x} AS {y} for x, y in zip(mappings['my_field'], mappings['my_alias'])]


SELECT
    {% for x in my_zip%}
        {{',\n\t\t'.join(x) }}
    {% endfor %}
FROM my_table
4

1 回答 1

1

在我看来,您正在尝试做的是获取数据,在 Python 中对其进行处理,然后将其放回 dbt 中。

我们最近发布了一个名为fal的工具,它可以很好地处理这种情况。

使用 fal 和 pandas 的方法:

# Get the schema_change as a pandas.DataFrame
schema_change = ref('data_seed_schema1_to_schema2')
#   my_field my_alias
# 0     fooA     barA
# 1     fooB     barB

# Build the dictionary necessary to change data column names
change_dict = dict(schema_change.values)
# {'fooA': 'barA', 'fooB': 'barB'}

data = ref('my_table')
#     fooA  fooB
# 0   True     1
# 1  False     2

# Change the data's column names
data = data.rename(columns=change_dict)
#     barA  barB
# 0   True     1
# 1  False     2


# Upload it back to the datawarehouse
write_to_source(data, 'source_name', 'source_table')

您可以在 SQL 中执行此操作和其他类型的处理,这并不容易或不可能,但可以轻松地从 dbt 中找到正确的数据。

于 2021-11-24T22:04:59.623 回答