Monday, January 13, 2020

12.2.0.1 And Later Support (Limited) Extended Stats On Virtual columns / Column Groups of Expressions

I do have a demo as part of my optimizer related workshops that shows the restriction / limitation of DBMS_STATS not supporting extended statistics on virtual columns / group of expressions, so for example the combination of both expressions and column groups, like ((TRUNC(COL1)), (TRUNC(COL2))).

Surprisingly, when following a certain sequence of operation, this starts working (to some degree) from 12.2.0.1 on.

The official documentation up to and including 19c still mentions this as a restriction, and since it doesn't work when explicitly referencing virtual columns (see the test case what I exactly mean by this) I assume this is more like a side effect / unintended feature.

Nevertheless, the optimizer happily picks up this additional information and comes up with improved estimates when having a combination of skew and correlation on expressions, for example.

The following test case shows the change in behaviour from 12.2.0.1 on:

set echo on linesize 200 trimspool on trimout on tab off pagesize 999 timing on

alter session set nls_language = american;

drop table t1;

purge table t1;

-- Initialize the random generator for "reproducible" pseudo-randomness
exec dbms_random.seed(0)

-- ATTR1 and ATTR2 are both skewed and correlated
create table t1
as
select
        rownum as id
      , trunc(dbms_random.value(1, 1000000000000)) as fk
      , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
      , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1000000
;

-- Histograms on ATTR1 and ATTR2 for representing skew properly
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')

-- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')

-- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')

-- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
-- Officially this is isn't allowed and errors out up to and including 12.1.0.2
-- ORA-20001: Error when processing extension -  missing right parenthesis
-- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
-- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
-- Remove the previous step and it will error out:
-- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')

-- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
-- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
-- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
explain plan for
select
        count(*)
from
        t1 a
where
        trunc(attr1) = 1
and     trunc(attr2) = 1;

select * from table(dbms_xplan.display(format => 'TYPICAL'));

-- But: Explicitly referencing a virtual column doesn't work
-- This will error out:
-- ORA-20001: Error when processing extension -  virtual column is referenced in a column expression

exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')

exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')

-- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')


alter table t1 add (trunc_attr1 as (trunc(attr1)));

alter table t1 add (trunc_attr2 as (trunc(attr2)));

-- This works and is supported in all versions supporting virtual columns
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')

-- This errors out: ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
-- even in 12.2.0.1 and later
-- But: Works in 12.2.0.1 and later if the call to
--   dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
-- above is removed, because the extension then already exists (!)
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')

-- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')

And here is the output I get when using 12.1.0.2:

SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 10 12:39:00 2020

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options


USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED  SPID            OPID  CPID            SADDR            PADDR
-------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ----------------
CBO_TEST             orcl121      DELLXPS13                 368   46472    12.1.0.2.0 20200110 6908            59    15536:4996      00007FFA110E9B88 00007FFA12B6F1E8



SQL>
SQL> alter session set nls_language = american;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> drop table t1;

Table dropped.

Elapsed: 00:00:00.02
SQL>
SQL> purge table t1;

Table purged.

Elapsed: 00:00:00.06
SQL>
SQL> -- Initialize the random generator for "reproducible" pseudo-randomness
SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> -- ATTR1 and ATTR2 are both skewed and correlated
SQL> create table t1
  2  as
  3  select
  4          rownum as id
  5        , trunc(dbms_random.value(1, 1000000000000)) as fk
  6        , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
  7        , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
  8        , rpad('x', 100) as filler
  9  from
 10          dual
 11  connect by
 12          level <= 1000000
 13  ;

Table created.

Elapsed: 00:00:13.07
SQL>
SQL> -- Histograms on ATTR1 and ATTR2 for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.43
SQL>
SQL> -- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.08
SQL>
SQL> -- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.51
SQL>
SQL> -- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
SQL> -- Officially this is isn't allowed and errors out up to and including 12.1.0.2
SQL> -- ORA-20001: Error when processing extension -  missing right parenthesis
SQL> -- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
SQL> -- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
SQL> -- Remove the previous step and it will error out:
SQL> -- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  missing right parenthesis


Elapsed: 00:00:00.04
SQL>
SQL> -- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
SQL> -- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
SQL> -- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
SQL> explain plan for
  2  select
  3          count(*)
  4  from
  5          t1 a
  6  where
  7          trunc(attr1) = 1
  8  and     trunc(attr2) = 1;

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display(format => 'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |  4777   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   807K|  6305K|  4777   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC("ATTR1")=1 AND TRUNC("ATTR2")=1)

14 rows selected.

Elapsed: 00:00:00.11
SQL>
SQL> -- But: Explicitly referencing a virtual column doesn't work
SQL> -- This will error out:
SQL> -- ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.13
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> -- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
BEGIN dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  missing right parenthesis


Elapsed: 00:00:00.02
SQL>
SQL>
SQL> alter table t1 add (trunc_attr1 as (trunc(attr1)));

Table altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter table t1 add (trunc_attr2 as (trunc(attr2)));

Table altered.

Elapsed: 00:00:00.01
SQL>
SQL> -- This works and is supported in all versions supporting virtual columns
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.57
SQL>
SQL> -- This errors out: ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
SQL> -- even in 12.2.0.1 and later
SQL> -- But: Works in 12.2.0.1 and later if the call to
SQL> --   dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
SQL> -- above is removed, because the extension then already exists (!)
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
ORA-06512: at "SYS.DBMS_STATS", line 34634
ORA-06512: at line 1


Elapsed: 00:00:00.06
SQL>
SQL> -- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  missing right parenthesis


Elapsed: 00:00:00.04
SQL>

And that is what I get from 12.2.0.1 on, here using 19.3:

SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 10 12:35:18 2020

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Verbunden mit:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production


USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED  SPID            OPID  CPID            SADDR            PADDR
-------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ----------------
CBO_TEST             orcl19c      DELLXPS13                 146   48961    19.0.0.0.0 20200110 5648            53    7260:13644      00007FF91687B3D8 00007FF91656B858



SQL>
SQL> alter session set nls_language = american;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> drop table t1;

Table dropped.

Elapsed: 00:00:00.02
SQL>
SQL> purge table t1;

Table purged.

Elapsed: 00:00:00.10
SQL>
SQL> -- Initialize the random generator for "reproducible" pseudo-randomness
SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> -- ATTR1 and ATTR2 are both skewed and correlated
SQL> create table t1
  2  as
  3  select
  4          rownum as id
  5        , trunc(dbms_random.value(1, 1000000000000)) as fk
  6        , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
  7        , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
  8        , rpad('x', 100) as filler
  9  from
 10          dual
 11  connect by
 12          level <= 1000000
 13  ;

Table created.

Elapsed: 00:00:11.89
SQL>
SQL> -- Histograms on ATTR1 and ATTR2 for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.68
SQL>
SQL> -- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.75
SQL>
SQL> -- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.43
SQL>
SQL> -- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
SQL> -- Officially this is isn't allowed and errors out up to and including 12.1.0.2
SQL> -- ORA-20001: Error when processing extension -  missing right parenthesis
SQL> -- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
SQL> -- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
SQL> -- Remove the previous step and it will error out:
SQL> -- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.24
SQL>
SQL> -- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
SQL> -- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
SQL> -- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
SQL> explain plan for
  2  select
  3          count(*)
  4  from
  5          t1 a
  6  where
  7          trunc(attr1) = 1
  8  and     trunc(attr2) = 1;

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display(format => 'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |  4797   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   893K|  6977K|  4797   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC("ATTR1")=1 AND TRUNC("ATTR2")=1)

14 rows selected.

Elapsed: 00:00:00.10
SQL>
SQL> -- But: Explicitly referencing a virtual column doesn't work
SQL> -- This will error out:
SQL> -- ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.33
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.33
SQL>
SQL> -- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.11
SQL>
SQL>
SQL> alter table t1 add (trunc_attr1 as (trunc(attr1)));

Table altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter table t1 add (trunc_attr2 as (trunc(attr2)));

Table altered.

Elapsed: 00:00:00.01
SQL>
SQL> -- This works and is supported in all versions supporting virtual columns
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.49
SQL>
SQL> -- This errors out: ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
SQL> -- even in 12.2.0.1 and later
SQL> -- But: Works in 12.2.0.1 and later if the call to
SQL> --   dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
SQL> -- above is removed, because the extension then already exists (!)
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
ORA-06512: at "SYS.DBMS_STATS", line 40751
ORA-06512: at "SYS.DBMS_STATS", line 40035
ORA-06512: at "SYS.DBMS_STATS", line 38912
ORA-06512: at "SYS.DBMS_STATS", line 15726
ORA-06512: at "SYS.DBMS_STATS", line 22064
ORA-06512: at "SYS.DBMS_STATS", line 22162
ORA-06512: at "SYS.DBMS_STATS", line 22232
ORA-06512: at "SYS.DBMS_STATS", line 22864
ORA-06512: at "SYS.DBMS_STATS", line 38313
ORA-06512: at "SYS.DBMS_STATS", line 39738
ORA-06512: at "SYS.DBMS_STATS", line 40183
ORA-06512: at "SYS.DBMS_STATS", line 40732
ORA-06512: at line 1


Elapsed: 00:00:00.09
SQL>
SQL> -- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.32
SQL>

So, from 12.2.0.1 it looks like extended statistics on virtual columns are supported to some degree, when following a certain sequence of operation - first creating extended statistics on each of the expressions used, which creates corresponding virtual columns under the cover, and afterwards creating extended statistics using those expressions as part of the column group expression.

Strange enough, when explicitly creating virtual columns for those expressions creating a column group explicitly referencing those virtual columns doesn't work - but using the expressions covered by the virtual columns instead works even with those virtual columns created explicitly.

1 comment:

  1. Hi Randolf

    Notice that the current 19c documentation that you reference contains the following sentence "The extension can contain an expression only if a corresponding virtual column has been created."
    So, they tried to fix it but they did it incorrectly ;-)

    Cheers,
    Chris

    ReplyDelete