如果我只关心每个组的单个最小值和最大值,这将相对容易,问题是我的要求是找到各种边界。一个示例数据集如下:
边界列组标识符 1个 3个 4个 7个 8乙 9乙 11乙 13个 14个 15个 16个
我需要从 sql 得到的结果集如下:
min max groupid
1 7 A
8 11 B
13 16 A
基本上找到每个组群的边界。
数据将存储在 oracle11g 或 mysql 中,因此可以为任一平台提供语法。
免责声明:使用前端语言查询部分结果和处理这样的事情要容易得多。那就是说...
以下查询适用于 Oracle(支持分析查询)但不适用于 MySQL(不支持)。这里有一个 SQL Fiddle 。
WITH BoundX AS (
SELECT * FROM (
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
)
WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT MIN, MAX, GROUPID
FROM (
SELECT
BoundaryColumn AS MIN,
LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
GroupIdentifier AS GROUPID,
GIDLag,
GIDLead
FROM BoundX
)
WHERE GROUPID = GIDLead
这是逻辑,一步一步。您也许可以对此进行改进,因为我觉得这里的子查询太多了...
此查询将前面和后面的GroupIdentifier
值拉入每一行:
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
结果如下所示:
BoundaryColumn GroupIdentifier GIDLag GIDLead
1 A A
3 A A A
4 A A A
7 A A B
8 B A B
9 B B B
11 B B A
13 A B A
14 A A A
15 A A A
16 A A
如果您添加逻辑以摆脱GIDLag
= GIDLead
=的所有行GroupIdentifier
,您将得到边界:
WITH BoundX AS (
SELECT * FROM (
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
)
WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT
BoundaryColumn AS MIN,
LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
GroupIdentifier AS GROUPID,
GIDLag,
GIDLead
FROM BoundX
加上这个结果是:
MIN MAX GROUPID GIDLAG GIDLEAD
--- --- ------- ------ -------
1 7 A A
7 8 A A B
8 11 B A B
11 13 B B A
13 16 A B A
16 A A
最后,仅包含GroupID = GIDLead
. 这是此答案顶部的查询。结果是:
MIN MAX GROUPID
--- --- -------
1 7 A
8 11 B
13 16 A
另一种方法(甲骨文)。在这里,我们简单地将针对表(您的表)发出的查询返回的结果集划分t1
为逻辑组(grp
)。每个新组在值GroupIdentifier
更改时开始:
select min(q.BoundaryColumn) as MinB
, max(q.BoundaryColumn) as MaxB
, max(q.GroupIdentifier) as groupid
from ( select s.BoundaryColumn
, s.GroupIdentifier
, sum(grp) over(order by s.BoundaryColumn) as grp
from ( select BoundaryColumn
, GroupIdentifier
, case
when GroupIdentifier <> lag(GroupIdentifier)
over(order by BoundaryColumn)
then 1
end as grp
from t1) s
) q
group by q.grp
结果:
MINB MAXB GROUPID
---------- ---------- -------
1 7 A
8 11 B
13 16 A
看看这个关于数据“运行”的网站:http ://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data
借助该链接中提供的知识,您可以编写如下查询:
SELECT BoundaryColumn,
GroupIdentifier,
(
SELECT COUNT(*)
FROM Table T
WHERE T.GroupIdentifier <> TR.GroupIdentifier
AND T.BoundaryColumn <= TR.BoundaryColumn
) as RunGroup
FROM Table TR
使用此信息,您可以按“RunGroup”分组,并选择 GroupIdentifier 和 min/max BoundaryColumn。
编辑:我感受到了同行的压力,这里有一个 SQLFiddle 和我的答案版本:http ://www.sqlfiddle.com/#!8/9a24c/4/0