0

我有一个包含多个列的 Oracle 表,其中一些列填充了一个变量,有大量可能的变量,下面的示例并不详尽。

ID  Col1  Col2  Col3  Col4  Col5 Col6
-------------------------------------
1   X2    B2
2   C3    D1    R4
3   B2    X2
4   E4    T1    W2
5   X2    B2
6   R4    D1   
7   D1    R4    C3

我需要确定不同组合的数量,其中上例中的第 1 行、第 3 行和第 5 行被认为是相同的组合,第 2 行和第 7 行也被认为是相同的。所以想要的结果看起来像:

Col1  Col2  Col3  Col4  Col5  Col6  Count(*)
------------------------------------------------
B2    X2                            3
C3    D1    R4                      2
E4    T1    W2                      1
D1    R4                            1

但如果我使用这个:

SELECT Col1, Col2, Col3, Col4, Col5, Col6, Count(*)
FROM MyTable
GROUP BY Col1, Col2, Col3, Col4, Col5, Col6
ORDER BY Count(*) DESC

然后我的数据中的第 3 行被认为是唯一的。但是,它与第 1 行和第 5 行具有相同的组合。第 2 行和第 7 行也不被认为是相同的,结果是:

Col1  Col2  Col3  Col4  Col5  Col6  Count(*)
------------------------------------------------
X2    B2                            2
C3    D1    R4                      1
B2    X2                            1
E4    T1    W2                      1
R4    D1                            1
D1    R4    C3                      1

看起来我需要在比较它们之前对 col 变量进行排序。但是,对于 Oracle 中多达 20 列数据的大型记录集(超过 300 万条记录),是否有一个优雅的解决方案?

4

2 回答 2

0

我想到了两种方法。首先,您可以编写一个接受六个或更多字符串并按顺序连接它们的函数。然后:

select colstring, count(*)
from
(
  select id, concat_sorted(col1, col2, col3, col4, col5, col6) as colstring
  from MyTable
)
group by colstring;

如果您有 Oracle 11g 或更高版本可用,另一种方法是将每列设为单独的记录并在其上使用 listagg:

select colstring, count(*)
from
(
  select id, listagg (colx, ',') within group (order by colx) as colstring
  from
  (
    select id, col1 as colx from MyTable
    union all
    select id, col2 from MyTable
    union all
    select id, col3 from MyTable
    union all
    select id, col4 from MyTable
    union all
    select id, col5 from MyTable
    union all
    select id, col6 from MyTable
  )
  group by id
)
group by colstring
于 2013-09-20T13:37:56.817 回答
0

像这样试试

WITH t AS (
SELECT 1 ID, 'X2' col1, 'B2' col2, NULL col3, NULL col4, NULL col5, NULL col6 FROM dual
UNION
SELECT 2, 'C3', 'D1', 'R4', NULL, NULL, NULL  FROM dual
UNION
SELECT 3, 'B2', 'X2', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 4, 'E4', 'T1', 'W2', NULL, NULL, NULL FROM dual
UNION
SELECT 5, 'X2', 'B2', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 6, 'R4', 'T1', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 7, 'D1', 'R4', 'C3', NULL, NULL, NULL FROM dual
)
SELECT col1, col2, col3, col4, col5, col6, tot_count
FROM (
     SELECT col1, col2, col3, col4, col5, col6, cnt,
            MAX(cnt) OVER (PARTITION BY val) AS tot_count,
            row_number() OVER (PARTITION BY val ORDER BY cnt DESC) AS rn
     FROM (
          SELECT col1, col2, col3, col4, col5, col6, val, count(*) OVER (PARTITION BY val) cnt
          FROM (
               SELECT A.ID, col1, col2, col3, col4, col5, col6, val
               FROM (SELECT ID, col1, col2, col3, col4, col5, col6
                    FROM  t
                    ) A,
                    (SELECT ID, listagg( val,',') WITHIN GROUP(ORDER BY  val DESC) AS val 
                     FROM (
                         SELECT ID, val
                         FROM   t
                         unpivot ( val FOR origin IN (col1,  col2, col3, col4, col5, col6))
                         )
                     GROUP BY ID
                     )b
                WHERE A.ID = b.ID
                )
           ORDER BY val
           )t1 
     )t2
WHERE tot_count = cnt 
AND rn = 1
ORDER BY tot_count DESC;
于 2013-09-20T20:20:39.007 回答