oracle - 有效地使用索引与组进行自联接

Question

我正在尝试加快以下速度

create table tab2 parallel 24 nologging compress for query high as
select /*+ parallel(24) index(a ix_1) index(b ix_2)*/ 
       a.usr
       ,a.dtnum
       ,a.company
       ,count(distinct b.usr) as num
       ,count(distinct case when b.checked_1 = 1 then b.usr end) as num_che_1
       ,count(distinct case when b.checked_2 = 1 then b.usr end) as num_che_2
from tab a
join tab b on a.company = b.company
              and b.dtnum between a.dtnum-1 and a.dtnum-0.0000000001                 
group by a.usr, a.dtnum, a.company;

通过使用索引

create index ix_1 on tab(usr, dtnum, company);
create index ix_2 on tab(usr, company, dtnum, checked_1, checked_2);

但是执行计划告诉我，这将是对两个索引的索引全扫描，并且计算时间很长（1天不够）。

关于数据。表格选项卡有超过 300 万条记录。没有一个列是唯一的。此处的唯一值是 (usr, dtnum) 对，其中 dtnum 是日期和时间，格式为 yyyy,mmddhh24miss 中的数字。列checked_1、checked_2 的值来自集合（null、0、1、2）。Company 持有公司的 ID。每对只能有一个值 checked_1、checked_2 和 company，因为它是唯一的。每个用户可以是具有不同 dtnum 的多对。

编辑

@Roberto Hernandez：我附上了执行计划的图片。至于parallel 24，在我们公司，我们被告知要创建带有选项“parallel [num] nologging compress for query high”的表。我正在使用 24，但我不是该领域的专家。

@Sayan Malakshinov：http ://sqlfiddle.com/#!4/40b6b/2在这里，我通过使用checked_1 = checked_2 提供数据进行了简化，但在现实生活中这可能不是真的。

@scaisEdge：对于

create index my_id1 on tab (company, dtnum);
create index my_id2 on tab (company, dtnum, usr);

我明白了

score 0 · Accepted Answer

对于表tab您的连接条件基于列

company, datun

所以你的索引应该主要基于这些列

create index my_id1 on tab (company, datum);

您使用的索引是无用的，因为不包含在最左边的位置列中使用 ij join /where 条件

最终，您可以添加用户权限最高的药水以避免表访问的需要，并让数据库引擎检索索引值内的所有 inf

 create index my_id1 on tab (company, datum, user, checked_1, checked_2);

score 0 · Accepted Answer

索引（位图或其他）对于此执行没有那么有用。如果查看执行计划，优化器会认为 group-by 会将输出减少到 1 行。这会导致序列化（PX SELECTOR）所以我会质疑你的统计数据的质量。您可能需要在三个 group-by 列上创建一个列组，以改进 group by 的基数估计。

oracle - 有效地使用索引与组进行自联接

2 回答 2

Related

Reference