1

我们的新数据库不(也不会)支持 PL/R 使用,我们广泛依赖它来实现随机加权样本函数:

CREATE OR REPLACE FUNCTION sample(
    ids bigint[],
    size integer,
    seed integer DEFAULT 1,
    with_replacement boolean DEFAULT false,
    probabilities numeric[] DEFAULT NULL::numeric[])
    RETURNS bigint[]
    LANGUAGE 'plr'

    COST 100
    VOLATILE 
AS $BODY$
    set.seed(seed)
    ids = as.integer(ids)
    if (length(ids) == 1) {
        s = rep(ids,size)
    } else {
        s = sample(ids,size, with_replacement,probabilities)
    }
    return(s)
$BODY$;

是否有针对同一功能的纯 SQL 方法?这篇文章展示了一种选择单个随机行的方法,但不具有一次采样多个组的功能。

据我所知,SQL Fiddle 不支持 PLR,因此请参阅下面的快速复制示例:

CREATE TABLE test
    (category text, uid integer, weight numeric)
;
    
INSERT INTO test
    (category, uid, weight)
VALUES
    ('a', 1,  45),
    ('a', 2,  10),
    ('a', 3,  25),
    ('a', 4,  100),
    ('a', 5,  30),
    ('b', 6, 20),
    ('b', 7, 10),
    ('b', 8, 80),
    ('b', 9, 40),
    ('b', 10, 15),
    ('c', 11, 20),
    ('c', 12, 10),
    ('c', 13, 80),
    ('c', 14, 40),
    ('c', 15, 15)
;

SELECT category,
        unnest(diffusion_shared.sample(array_agg(uid ORDER BY uid),
                                       1,
                                       1,
                                       True,
                                       array_agg(weight ORDER BY uid))
                                       ) as uid
FROM test
WHERE category IN ('a', 'b')
GROUP BY category;

哪个输出:

category  uid
'a'       4
'b'       8

有任何想法吗?

4

0 回答 0