3

Hello all :) I'm building a tool to do some volume sampling on our Oracle 10g database. Here is the query:

SELECT count(*) 
FROM product
JOIN customer ON product.CUSTOMER_ID = customer.ID
WHERE 
 (    product.CATEGORY = 'some first category criteria'
  AND customer.REGION = 'some first region criteria'
  AND ...)
 OR
 (    product.CATEGORY = 'some second category criteria'
  AND customer.REGION = 'some second region criteria'
  AND ...)
 OR ...

All I need from this query is doing counts. The thing is the volumes are big: about 30 milion rows on each table, and I'd like this query to be responsive.

So far, having composite indexes on customer (<search criteria column>, CUSTOMER_ID) has helped a lot. I think it ha helper Oracle go the JOIN after an indexed filter operation.

Each (... AND ... AND ...) block is expected to contain roughly 50 000 rows. The columns used in the search criteria all have values in sets sized around 1000 values.

I was wondering what approach I could implement regarding the fact that I'll only do count(*)s, especially since Oracle has a built-in OLAP module (and a CUBE operation?). Also, I'm sure things can be improved a lot by well thought indexes and hints.

How would you design this?

4

1 回答 1

1

这看起来是位图索引的好候选:

位图索引主要用于数据仓库或查询以特殊方式引用许多列的环境。可能需要位图索引的情况包括:

索引列的基数较低,也就是说,与表行数相比,不同值的数量很小。

索引表要么是只读的,要么不受 DML 语句的重大修改。

具体来说,位图连接索引在这里可能是理想的。手册中的示例甚至与您的数据模型相匹配。我尝试在下面重新创建您的模型和数据,位图连接索引的运行速度似乎比其他解决方案快几个数量级。

样本数据

--Create tables
create table customer
(
    customer_id number,
    region      varchar2(100) not null
) nologging;

create table product
(
    product_id  number,
    customer_id number not null,
    category    varchar2(100) not null
) nologging;


--Load 30M rows, 1M rows at a time.  Takes about 6 minutes.
begin
    for i in 1 .. 30 loop
        insert /*+ append */ into customer
        select (1000000*i)+level, 'Region '||trunc(dbms_random.value(1, 1000))
        from dual connect by level <= 1000000;
        commit;

        insert /*+ append */ into product
        select (1000000*i)+level, (1000000*i)+level
            ,'Category '||trunc(dbms_random.value(1, 1000))
        from dual connect by level <= 1000000;
        commit;
    end loop;
end;
/

--Add primary keys and foreign key constraints.
alter table customer add constraint customer_pk primary key (customer_id);
alter table product add constraint product_pk primary key (product_id);
alter table product add constraint product_customer_fk
    foreign key (customer_id) references customer(customer_id);

--Gather stats
begin
    dbms_stats.gather_table_stats(user, 'CUSTOMER');
    dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

未编入索引 - 慢

正如预期的那样,性能很差。这个示例查询在我的机器上大约需要 75 秒。

SELECT count(*) 
FROM product
JOIN customer ON product.CUSTOMER_ID = customer.customer_id
WHERE (product.CATEGORY = 'Category 1' AND customer.REGION = 'Region 1')
 OR   (product.CATEGORY = 'Category 2' AND customer.REGION = 'Region 2')
 OR   (product.CATEGORY = 'Category 888' AND customer.REGION = 'Region 888');

B-tree 索引 - 仍然很慢

计划发生变化,但性能保持不变。我认为这可能是因为我的示例是最坏的索引场景,其中数据是真正随机的。

create index customer_idx on customer(region);
create index product_idx on product(category);

begin
    dbms_stats.gather_table_stats(user, 'CUSTOMER');
    dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

位图索引 - 好一点

这稍微提高了性能,大约为 61 秒。

drop index customer_idx;
drop index product_idx;

create bitmap index customer_bidx on customer(region);
create bitmap index product_bidx on product(category);

begin
    dbms_stats.gather_table_stats(user, 'CUSTOMER');
    dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

位图连接索引 - 非常快

现在查询几乎立即返回结果,我的 IDE 将其计为 0 秒。

drop index customer_idx;
drop index product_idx;

create bitmap index customer_product_bjix
on product(product.category, customer.region)
FROM product, customer
where product.CUSTOMER_ID = customer.customer_id;

begin
    dbms_stats.gather_table_stats(user, 'CUSTOMER');
    dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

指数成本

位图连接索引的创建时间比 b 树或位图索引要长一些。与位图或位图连接索引相比,b 树索引非常大。

select segment_name, bytes/1024/1024 MB
from dba_segments
where segment_name in ('CUSTOMER_IDX', 'PRODUCT_IDX'
    ,'CUSTOMER_BIDX', 'PRODUCT_BIDX',  'CUSTOMER_PRODUCT_BJIX');


SEGMENT_NAME            MB
------------            --
CUSTOMER_IDX            726
PRODUCT_IDX             792
CUSTOMER_BIDX            88
PRODUCT_BIDX             96
CUSTOMER_PRODUCT_BJIX   184

查询方式

这不会影响性能,但您可以像这样缩小查询:

SELECT count(*) 
FROM product
JOIN customer ON product.CUSTOMER_ID = customer.customer_id
WHERE (product.category, customer.region)
    in (('Category 1', 'Region 1'),
        ('Category 2', 'Region 2'),
        ('Category 888', 'Region 888'));
于 2013-06-07T04:51:36.567 回答