3

我正在使用 MySQL 来保存我在 HPC 集群上运行的大量模拟的数据。每个模拟在一个表中都有自己的条目,还有一个保存模拟时间步结果数据的表。时间步长结果数据表非常大(几千万到几亿行)。表格如下所示:

表:模拟

id      descriptor  notes 
1       SIM1        notes here...
2       SIM2        SIM2 Notes...
...     ...         ...
8643    SIM8643     SIM8643 Notes...

表:simulations_ts

id         simulation_id    step        data_value
1          1                1           0.05
2          1                2           0.051
...        ...              ...         ...
1983       1                1983        0.253
1984       2                1           0.043
...        ...              ...         ...
59345435   8643             2832        0.067

我希望能够有效地返回下表:

simulation_id    first_ts_id     last_ts_id  num_steps
1                1               1983        1983
2                1984            2938434     2052
...              ...             ...         ...
8643             12835283        59345435    2832

我知道我可以执行如下查询:

SELECT
   simulation_id
   MIN(step) AS first_step,
   MAX(step) AS last_step,
   COUNT(id) AS num_steps
FROM
   simulations_ts
GROUP BY
   simulation_id
ORDER BY
   simulation_id ASC

并且有一些方法可以进行子查询以提取一个聚合的相应 id,但我没有找到为两个聚合函数提取相应 id 的示例。这是否有可能以有效的方式在单个查询中完成,或者我最好只是单步执行并分别进行最小查找和最大查找?

4

2 回答 2

2
SELECT simulation_id, first.id as first_ts_id, last.id as last_ts_id, num_steps
FROM (SELECT simulation_id, MIN(step) minstep, MAX(step) maxstep, COUNT(*) num_steps
      FROM simulations_ts
      GROUP BY simulation_id) AS g
JOIN simulations_ts first ON first.simulation_id = g.simulation_id AND first.step = g.minstep
JOIN simulations_ts last ON last.simulation_id = g.simulation_id AND last.step = g.maxstep
于 2013-10-21T16:27:32.843 回答
1

我想这就是你所追求的。请注意,我只显示simulations_ts 的别名first_dim_idlast_dim_id别名中的id 列,但您当然可以显示该表中的其他列。

SELECT
   main.simulation_id,
   first_step,
   first_sim.id as first_sim_id,
   last_step,
   last_sim.id as last_sim_id
FROM
   (SELECT
       simulation_id,
       MIN(step) AS first_step,
       MAX(step) AS last_step,
       COUNT(id) AS num_steps
    FROM
       simulations_ts
    GROUP BY
       simulation_id) as main
    JOIN simulations_ts first_sim
         ON main.simulation_id = first_sim.simulation_id
            AND main.first_step = first_sim.step
    JOIN simulations_ts last_sim
         ON main.simulation_id = first_sim.simulation_id
            AND main.last_step = last_sim.step

我从您的原始查询开始,然后简单地将其加入到simulations_tssim id 和 min/max 步骤中。

于 2013-10-21T16:26:25.530 回答