对于以下查询,我将不胜感激。我们有一个实验列表及其当前进度状态(为简单起见,我将状态减少到 4 种,但我们的数据中有 10 种不同的状态)。我最终需要返回所有未完成实验的当前状态列表。
给定一个表 exp_status,
Experiment | ID | Status
----------------------------
A | 1 | Starting
A | 2 | Working On It
B | 3 | Starting
B | 4 | Working On It
B | 5 | Finished Type I
C | 6 | Starting
D | 7 | Starting
D | 8 | Working On It
D | 9 | Finished Type II
E | 10 | Starting
E | 11 | Working On It
F | 12 | Starting
G | 13 | Starting
H | 14 | Starting
H | 15 | Working On It
H | 16 | Finished Type II
期望的结果集:
Experiment | ID | Status
----------------------------
A | 2 | Working On It
C | 6 | Starting
E | 11 | Working On It
F | 12 | Starting
G | 13 | Starting
最近的 ID 号将对应于最近的状态。
现在,我在 150 秒内执行的当前代码。
SELECT *
FROM
(SELECT Experiment, ID, Status,
row_number () over (partition by Experiment
order by ID desc) as rn
FROM exp_status)
WHERE rn = 1
AND status NOT LIKE ('Finished%')
问题是,这段代码浪费了时间。结果集是从 390 万个表中提取的 45000 行。这是因为大多数实验都处于完成状态。代码通过并对所有这些进行排序,然后仅在最后过滤掉完成的内容。表中大约 95% 的实验处于完成阶段。我无法弄清楚如何让查询首先挑选出所有没有“完成”的实验和状态。我尝试了以下但性能非常慢。
SELECT *
FROM exp_status
WHERE experiment NOT IN
(
SELECT experiment
FROM exp_status
WHERE status LIKE ('Finished%')
)
任何帮助,将不胜感激!