sql - 如何采样精确的行数（n
我有一个数据集，每天为每个 mac 地址填充 0 到 48 个测量值（每半小时一次）（有时由于各种原因我们可能无法获得每个测量值）。通常，我按天分组并取测量值的平均值，但是随着 MAC 地址数量的增加，我们打算用更少的测量值来构成平均值。这是我执

Question

我有一个数据集，每天为每个 mac 地址填充 0 到 48 个测量值（每半小时一次）（有时由于各种原因我们可能无法获得每个测量值）。通常，我按天分组并取测量值的平均值，但是随着 MAC 地址数量的增加，我们打算用更少的测量值来构成平均值。这是我执行的查询示例：

select fmc.mac_address, 
           inf.node, 
           inf.uf, 
           inf.cidade,
           date_trunc('day', fmc.data) as data,
           avg(inf.qoe) as qoe, 
           avg(inf.qoe_download) as qoe_download,
           avg(inf.qoe_upload) as qoe_upload, 
           avg(inf.qoe_packetloss) as qoe_packetloss,
           avg(inf.qoe_latency) as qoe_latency, 
           avg(inf.qoe_jitter) as qoe_jitter
    
    from fixa_medicoes_claro fmc inner join public.inference_mac inf on fmc.mac_address = inf.mac
    
    where data >= '2020-12-14'
    and mac_address in {}
    
    group by fmc.mac_address, 
             inf.node, 
             inf.uf, 
             inf.cidade,
             date_trunc('day', fmc.data)

现在我们想为每个分组数据查询较少数量的样本，但有一个限制，无论 mac_address 每天的测量次数是多少，我想查询其中的最大“n”个样本，同时也限制它们是时间间隔相等。Ps.：时间戳只记录一天，所以我们不知道特定样本的小时/分钟。

score 0 · Accepted Answer

您可以使用row_number()获取随机样本：

select . . .
from (select fmc.*, inf.*, -- the columns you need
             row_number() over (partition by mac_address order by random()) as seqnum
      from fixa_medicoes_claro fmc inner join
           public.inference_mac inf
           on fmc.mac_address = inf.mac    
      where data >= '2020-12-14' and
            mac_address in {}
     ) fi
where seqnum <= 10
group by . . .

如果您希望行均匀分布，那么一个简单的方法是ntile()使用row_number()：

select . . .
from (select fi.*,
             row_number() over (partition by mac_address, tile order by fmc_data) as seqnum
      from (select fmc.*, inf.*, -- the columns you need
                   ntile(10) over (partition by mac_address order by fmc_data) as tile
            from fixa_medicoes_claro fmc inner join
                 public.inference_mac inf
                 on fmc.mac_address = inf.mac    
            where data >= '2020-12-14' and
                  mac_address in {}
           ) fi
     ) fi
where seqnum <= 1
group by . . .

CMake 在 Windows 上找不到 Boost

1 回答 1

CMake 在 Windows 上找不到 Boost

1 回答 1

Related

Reference