arrays - 在数组上聚合函数

Question

我有一张这样的桌子：

+-----+----------------+
| 身份证 | 数组300 |
+-----+----------------+
| 100 | {110,25,53,..} |
| 101 | {56,75,59,...} |
| 102 | {65,93,82,...} |
| 103 | {75,70,80,...} |
+-----+----------------+

array300列是一个包含 300 个元素的数组。我需要有 100 个元素的数组，每个元素代表array300的 3 个元素的平均值。对于此示例，答案将类似于：
array100
{62.66,...}
{63.33,...}
{80,...}
{78.33,...}

score 11 · Accepted Answer

尝试这样的事情：

SELECT id, unnest(array300) as val, ntile(100) OVER (PARTITION BY id) as bucket_num
FROM your_table

这将为您提供每条相同的SELECT300 条记录，并为它们分配（1 代表第 3 个元素，2 代表接下来的 3 个元素，依此类推）。array300idbucket_num

然后使用这个选择来获取avg桶中的元素：

SELECT id, avg(val) as avg_val
FROM (...previous select here...)
GROUP BY id, bucket_num

接下来 - 只需聚合avg_valinto 数组：

SELECT id, array_agg(avg_val) as array100
FROM (...previous select here...)
GROUP BY id

详细信息：unnest，ntile，array_agg，OVER (PARTITION BY )

UPD：试试这个功能：

CREATE OR REPLACE FUNCTION public.array300_to_100 (
  p_array300 numeric []
)
RETURNS numeric [] AS
$body$
DECLARE
  dim_start int = array_length(p_array300, 1); --size of input array
  dim_end int = 100; -- size of output array
  dim_step int = dim_start / dim_end; --avg batch size
  tmp_sum NUMERIC; --sum of the batch
  result_array NUMERIC[100]; -- resulting array
BEGIN

  FOR i IN 1..dim_end LOOP --from 1 to 100.
    tmp_sum = 0;

    FOR j IN (1+(i-1)*dim_step)..i*dim_step LOOP --from 1 to 3, 4 to 6, ...
      tmp_sum = tmp_sum + p_array300[j];  
    END LOOP; 

    result_array[i] = tmp_sum / dim_step;
  END LOOP; 

  RETURN result_array;
END;
$body$
LANGUAGE 'plpgsql'
IMMUTABLE
RETURNS NULL ON NULL INPUT;

它需要一个array300并输出一个array100。要使用它：

SELECT id, array300_to_100(array300)
FROM table1;

如果您在理解它时有任何问题 - 只要问我。

score 6 · Accepted Answer

将 Igor 的碎片换成另一种形式：

 select id, array300, (
    select array_agg(z) from
    (
        select avg(x) from 
        (
            select x, ntile(array_length(array300,1)/3) over() from unnest(array300) x
        ) y 
        group by ntile
    ) z
) array100
from your_table

对于这样的小示例表

 id |       array300        
----+-----------------------
  1 | {110,25,53,110,25,53}
  2 | {56,75,59,110,25,53}
  3 | {65,93,82,110,25,53}
  4 | {75,70,80,110,25,53}

结果是：

 id |       array300        |                   array100                    
----+-----------------------+-----------------------------------------------
  1 | {110,25,53,110,25,53} | {(62.6666666666666667),(62.6666666666666667)}
  2 | {56,75,59,110,25,53}  | {(63.3333333333333333),(62.6666666666666667)}
  3 | {65,93,82,110,25,53}  | {(80.0000000000000000),(62.6666666666666667)}
  4 | {75,70,80,110,25,53}  | {(75.0000000000000000),(62.6666666666666667)}
(4 rows)

编辑我的第一个版本使用了修复ntile(2)。这仅适用于大小为 6 的源数组。我已通过改用来解决此问题array_length(array300,1)/3。

score 1 · Accepted Answer

Is this any faster?

Edit: This is more elegant:

with  t as (select generate_series(1, 100,1) a , generate_series(101,200,1) b , generate_series(201,300,1) c)

    select 
        id,
        array_agg((array300[a] + array300[b] + array300[c]) / 3::numeric order by a)  as avg
    from 
        t,
        tmp.test2
    group by 
        id

End of edit

Edit2 This is the shortest select I can think of:

select 
    id,
    array_agg((array300[a] + array300[a+100] + array300[a+200]) / 3::numeric order by a)  as avg
from 
    (select generate_series(1, 100,1) a) t,
    tmp.test2
group by 
    id

End of edit2

with 

t as (select generate_series(1, 100,1) a , generate_series(101,200,1) b , generate_series(201,300,1) c)

,u as (
    select 
        id,
        a,
        (array300[a] + array300[b] + array300[c]) / 3::numeric as avg
    from 
        t,
        tmp.test2 /* table with arrays - id, array300 */
    order by 
        id,
        a
 )

select 
    id, 
    array_agg(avg)
from 
    u 
group by 
    id

score 1 · Accepted Answer

我无法完全回答您的问题，但是我找到了用于对整数数组求和的聚合函数。也许有人（或您）可以将其修改为 avg。

来源： http: //archives.postgresql.org/pgsql-sql/2005-04/msg00402.php

CREATE OR REPLACE FUNCTION array_add(int[],int[]) RETURNS int[] AS '
  DECLARE
    x ALIAS FOR $1;
    y ALIAS FOR $2;
    a int;
    b int;
    i int;
    res int[];
  BEGIN
    res = x;

    a := array_lower (y, 1);
    b := array_upper (y, 1);

    IF a IS NOT NULL THEN
      FOR i IN a .. b LOOP
        res[i] := coalesce(res[i],0) + y[i];
      END LOOP;
    END IF;

    RETURN res;
  END;
'
LANGUAGE plpgsql STRICT IMMUTABLE;

--- then this aggregate lets me sum integer arrays...

CREATE AGGREGATE sum_integer_array (
    sfunc = array_add,
    basetype = INTEGER[],
    stype = INTEGER[],
    initcond = '{}'
);


Here's how my sample table looked  and my new array summing aggregate
and function:

#SELECT * FROM arraytest ;
 id | somearr
----+---------
 a  | {1,2,3}
 b  | {0,1,2}
(2 rows)

#SELECT sum_integer_array(somearr) FROM arraytest ;
 sum_integer_array
-------------------
 {1,3,5}
(1 row)

arrays - 在数组上聚合函数

4 回答 4

Related

Reference