1

我有一个只有 7 列的复杂表,但在生产中它会有很多行,比如超过 100,000 行。

所以为此我为两列执行RUNSTATS,一列是PK,另一列是FK..

RUNSTATS ON TABLE WEBSS.P0029_LOCATION  WITH DISTRIBUTION ON COLUMNS (LOC_ID, OUTLET_ID);

在此之后,当我跑步时

SELECT * FROM SYSCAT.COLDIST WHERE TABSCHEMA = 'WEBSS' AND TABNAME = 'P0029_LOCATION' 

结果我有60行..两列各30行..我的类型是Q和F..分位数和频率..

但是我需要更多的输入。他们(Q和F)是在什么基础上定义的。我们需要在什么基础上进行优化。

请倾诉你的建议。

4

1 回答 1

2

There are two type of column statistics on DB2, simple ones where you just get the column cardinality and the number of nulls, and distribution stats as you have collected above.

I found simple statistics are better for most applications unless you do literal searches on highly skewed data.

If you have indexes defined on you PKs and FKs you get simple stats with

RUNSTATS ON MYTABLE ON KEY COLUMNS

or

RUNSTATS ON MYTABLE ON ALL COLUMNS

The quantiles are histogram data, and you get by default I think 20 histogram values for each, and the F are the most popular values in your column, and I then you get 10 by default. You don't need distributions on a PK, as it's unique, and it's unlikely you need them on an FK as well. Stick to the simple ones first.

于 2012-12-01T15:47:50.153 回答