google-cloud-platform - 在 BigQuery 中使用 UDF 时是否可以在窗口之间保持共享状态？

Question

这是我之前关于在 BigQuery 中模拟聚合函数（如在 PGSQL 中）的问题的后续问题。

上一个问题中提出的解决方案确实适用于每个窗口上应用的函数独立于前一个窗口的情况 - 例如计算简单平均值等，但是在计算指数移动平均等递归函数时，公式为： EMA[i] = price[i]*k + EMA[i-1]×(1−k)

使用上一个问题中的相同示例，

CREATE OR REPLACE FUNCTION temp_db.ema_func(arr ARRAY<int64>, window_size int8)
RETURNS int64 LANGUAGE js AS """
    if(arr.length<=window_size){
        // calculate a simple moving average till end of first window
        var SMA = 0;
        for(var i = 0;i < arr.length; i++){
            SMA = SMA + arr[i]
        }
        return SMA/arr.length
    }else{
        // start calculation of EMA where EMA[i-1] is the SMA we calculated for the first window
        // note: hard-coded constant (k) for the sake of simplicity
        // the problem: where do I get EMA[i-1] or prev_EMA from?
        // in this example, we only need the most recent value, but in general case, we would 
        // potentially have to do other calculations with the new value 
        return curr[curr.length-1]*(0.05) + prev_ema*(1−0.05)
    }
""";

select s_id, temp_db.ema_func(ARRAY_AGG(s_price) over (partition by s_id order by s_date rows 40 preceding), 40) as temp_col
from temp_db.s_table;

在 PGSQL 中将状态变量存储为自定义类型非常容易，并且是聚合函数参数的一部分。是否可以使用 BigQuery 模拟相同的功能？

score 2 · Accepted Answer

我认为 BigQuery 不能通用，而是想看看具体情况，看看是否有一些合理的解决方法是可能的。同时，递归和聚合 UDF 在 BQ 中 [希望还] 不受支持，因此您可能需要提交相应的功能请求。

同时结帐BQ 脚本，但我认为您的情况不适合那里

google-cloud-platform - 在 BigQuery 中使用 UDF 时是否可以在窗口之间保持共享状态？

1 回答 1

Related

Reference