Oracle related stuff: Correlation, nocorrelation and extended stats

Riyaj Shamsudeen recently published a very interesting blog post about the 11g extended statistics feature and correlated column values.

Because I think that some of his findings were significantly influenced by the fact that he generated frequency histograms on all columns, I've repeated his test case with some more variations, in particular with and without histograms and in case of histograms, with and without the new feature "Bug 5483301: Cardinality of 1 when predicate value non-existent in frequency histogram" which was introduced in 10.2.0.4 and 11.1.0.6. More details about this change can be found in Metalink Doc ID 5483301.8.

The following results were obtained using 11.1.0.7 Enterprise Edition on Windows 32bit.

I'll start with his case 1:

create table y1 (a number, b number,  c number);

begin
  for i in 1..1000 loop
    for j in 1..10 loop
      insert into y1 values (j,mod(j,5), mod(j,2) );
    end loop;
  end loop;
end;
/

commit;

REM Distribution of these column values given below.
select a, b, count(*) from y1 group by a,b order by a,b
/
         A          B   COUNT(*)
---------- ---------- ----------
         1          1       1000
         2          2       1000
         3          3       1000
         4          4       1000
         5          0       1000
         6          1       1000
         7          2       1000
         8          3       1000
         9          4       1000
        10          0       1000

10 rows selected.

REM Let's also add an index to this table
create index y1_i1 on y1 (a, b);

REM The number of distinct keys is used
REM to determine selectivity if
REM a) an all-equal operation on the entire index is used
REM and obviously
REM b) NO histogram is present
select distinct_keys from user_indexes where index_name = 'Y1_I1';

DISTINCT_KEYS
-------------
           10

begin
   dbms_stats.gather_table_stats (
   ownname =>null,
   tabname=>'y1',
   estimate_percent=>null,
   cascade=>true,
   method_opt =>'for all columns size 254');
end;
/

REM this shows that although the index is in place
REM the DISTINCT_KEYS information is ignored
REM but the histogram is used instead
REM and we fall back to the default selectivity formula
REM see at the end of the script for a demonstration
REM without index which comes to the same result
alter session set tracefile_identifier = 'correlated1';

alter session set events '10053 trace name context forever, level 1';

explain plan for select c from y1 where a=1 and b=1;

alter session set events '10053 trace name context off';

The corresponding 10053 trace file excerpt shows:

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for Y1[Y1] 
  Column (#1): 
    NewDensity:0.050000, OldDensity:0.000050 BktCnt:10000, PopBktCnt:10000, PopValCnt:10, NDV:10
  Column (#2): 
    NewDensity:0.100000, OldDensity:0.000050 BktCnt:10000, PopBktCnt:10000, PopValCnt:5, NDV:5
  ColGroup (#1, Index) Y1_I1
    Col#: 1 2    CorStregth: 5.00
  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial: 
  Table: Y1  Alias: Y1
    Card: Original: 10000.000000  Rounded: 200  Computed: 200.00  Non Adjusted: 200.00
  Access Path: TableScan
    Cost:  12.96  Resp: 12.96  Degree: 0
      Cost_io: 12.00  Cost_cpu: 2396429
      Resp_io: 12.00  Resp_cpu: 2396429
  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial: 
  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial: 
  Access Path: index (AllEqRange)
    Index: Y1_I1
    resc_io: 5.00  resc_cpu: 114847
    ix_sel: 0.020000  ix_sel_with_filters: 0.020000 
    Cost: 5.05  Resp: 5.05  Degree: 1
  Best:: AccessPath: IndexRange
  Index: Y1_I1
         Cost: 5.05  Degree: 1  Resp: 5.05  Card: 200.00  Bytes: 0

In contrast to Riyah's interpretation I would say that it shows that the index information is not used to derive the selectivity ("Matches Full:"). Rather it looks like the presence of the frequency histogram causes a fall back to the default formula used for non-correlated columns: selectivity(A) * selectivity(B) which leads to the selectivity of 0.02 and hence cardinality of 200.

Now repeating the same without any histograms:

begin
   dbms_stats.delete_table_stats (
   ownname =>null,
   tabname=>'y1',
   cascade_indexes=>true);
end;
/

begin
   dbms_stats.gather_table_stats (
   ownname =>null,
   tabname=>'y1',
   estimate_percent=>null,
   cascade=>true,
   method_opt =>'for all columns size 1');
end;
/

REM without histograms
REM the all-equal on entire index rule is used
REM hence the selectivity is 0.1 => 1/DISTINCT_KEYS of index
alter session set tracefile_identifier = 'correlated2';

alter session set events '10053 trace name context forever, level 1';

explain plan for select c from y1 where a=1 and b=1;

alter session set events '10053 trace name context off';