Oracle 11g added
Extended Statistics support for
column groups in order to detect correlated columns for filter predicates using an
equal comparison.
Note that Oracle 11g also added the ability to use the number of
distinct keys of a
composite index as an upper limit for the cardinality estimates for matching column predicates, which means that the optimizer is now capable of detecting
correlated columns without the explicit addition of Extended Statistics / Column Groups.
The "composite index" feature was also backported to 10.2.0.4 and 10.2.0.5 but needs to be activated explicitly via a FIX_CONTROL. You can read more about that on
Jonathan Lewis' blog.
The "Column Group" feature (either by explicit Extended Statistics or via a matching composite index) however has a
weakness in a very special case: If there is only a
single distinct value in the statistics then the
"out-of-range" detection of the optimizer is not working correctly.
This can be easily demonstrated with a very simple test case:
set echo on linesize 200 pagesize 999 tab off trimspool on trimout on
drop table t;
purge table t;
create table t
as
select
rownum as id
, mod(rownum, 1) + 1 as attr1
, mod(rownum, 1) + 1 as attr2
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;
select
attr1
, attr2
, count(*) as cnt
from
t
group by
attr1
, attr2
;
exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1')
create index t_idx on t (attr1, attr2);
explain plan for
select * from t where attr1 = 1 and attr2 = 1;
select * from table(dbms_xplan.display(null, null, 'BASIC PREDICATE ROWS'));
explain plan for
select * from t where attr1 = 0 and attr2 = 0;
select * from table(dbms_xplan.display(null, null, 'BASIC PREDICATE ROWS'));
explain plan for
select * from t where attr1 = 2 and attr2 = 2;
select * from table(dbms_xplan.display(null, null, 'BASIC PREDICATE ROWS'));
explain plan for
select * from t where attr1 = 100000 and attr2 = 100000;
select * from table(dbms_xplan.display(null, null, 'BASIC PROJECTION ROWS'));
explain plan for
select * from t where attr1 = -100000 and attr2 = 100000;
select * from table(dbms_xplan.display(null, null, 'BASIC PROJECTION ROWS'));
The relevant output is:
SQL> select
2 attr1
3 , attr2
4 , count(*) as cnt
5 from
6 t
7 group by
8 attr1
9 , attr2
10 ;
ATTR1 ATTR2 CNT
---------- ---------- ----------
1 1 1000000
select * from t where attr1 = 1 and attr2 = 1;
------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ATTR1"=1 AND "ATTR2"=1)
select * from t where attr1 = 0 and attr2 = 0;
------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ATTR1"=0 AND "ATTR2"=0)
select * from t where attr1 = 2 and attr2 = 2;
------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ATTR1"=2 AND "ATTR2"=2)
select * from t where attr1 = 100000 and attr2 = 100000;
------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ATTR1"=100000 AND "ATTR2"=100000)
select * from t where attr1 = -100000 and attr2 = 100000;
------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 1000K|
|* 1 | TABLE ACCESS FULL| T | 1000K|
------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ATTR1"=(-100000) AND "ATTR2"=100000)
As you can see the optimizer still thinks to return
"all" rows even in obvious cases where
no rows will be returned.
In 11g the problem does
not appear if there is
more than a single distinct value, however the 10g backport seems to produce similar silly estimates even with more than a single distinct value.
One possible
workaround is to generate a
histogram on the column group in 11g. This way the out-of-range detection works again as expected for the single distinct value case.
It's also possible to generate a histogram
on one or more of the affected columns
each to avoid the problem, but this
"breaks" the column group and
correlation detection and hence might lead to worse cardinality estimates in the case of more than a single distinct value.
Note that when generating
histograms you need to carefully check if you run into the
problems described
here: If you end up with a histogram containing only a
single bucket but the underlying basic column statistics show
more than a single distinct value then in principle unfortunately again the same issue will show up as demonstrated in the other blog post.
Note that this
inconsistency between basic column statistics (Number Of Distinct Values, NDV) and the corresponding histogram is a consequence of how 11g gathers statistics when using the
AUTO_SAMPLE_SIZE default.
Maria Colgan has just published
a note about that.