I do have a demo as part of my
optimizer related workshops that shows the
restriction / limitation of DBMS_STATS not supporting extended
statistics on virtual columns / group of expressions, so for
example the combination of both expressions and column groups,
like ((TRUNC(COL1)), (TRUNC(COL2))).
Surprisingly, when following a certain sequence of operation,
this starts working (to some degree) from 12.2.0.1 on.
The official
documentation up to and including 19c still mentions
this as a restriction, and since it doesn't work when explicitly
referencing virtual columns (see the test case what I exactly mean by
this) I assume this is more like a side effect / unintended
feature.
Nevertheless, the optimizer happily picks up this additional information and comes up with improved estimates when having a combination of skew and correlation on expressions, for example.
The following test case shows the change in behaviour from 12.2.0.1 on:
set echo on linesize 200 trimspool on trimout on tab off pagesize 999 timing on
alter session set nls_language = american;
drop table t1;
purge table t1;
-- Initialize the random generator for "reproducible" pseudo-randomness
exec dbms_random.seed(0)
-- ATTR1 and ATTR2 are both skewed and correlated
create table t1
as
select
rownum as id
, trunc(dbms_random.value(1, 1000000000000)) as fk
, case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
, case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;
-- Histograms on ATTR1 and ATTR2 for representing skew properly
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')
-- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')
-- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')
-- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
-- Officially this is isn't allowed and errors out up to and including 12.1.0.2
-- ORA-20001: Error when processing extension - missing right parenthesis
-- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
-- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
-- Remove the previous step and it will error out:
-- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
-- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
-- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
-- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
explain plan for
select
count(*)
from
t1 a
where
trunc(attr1) = 1
and trunc(attr2) = 1;
select * from table(dbms_xplan.display(format => 'TYPICAL'));
-- But: Explicitly referencing a virtual column doesn't work
-- This will error out:
-- ORA-20001: Error when processing extension - virtual column is referenced in a column expression
exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')
exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')
-- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
alter table t1 add (trunc_attr1 as (trunc(attr1)));
alter table t1 add (trunc_attr2 as (trunc(attr2)));
-- This works and is supported in all versions supporting virtual columns
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')
-- This errors out: ORA-20001: Error when processing extension - virtual column is referenced in a column expression
-- even in 12.2.0.1 and later
-- But: Works in 12.2.0.1 and later if the call to
-- dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
-- above is removed, because the extension then already exists (!)
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')
-- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
And here is the output I get when using 12.1.0.2:
SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 10 12:39:00 2020
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR
-------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ----------------
CBO_TEST orcl121 DELLXPS13 368 46472 12.1.0.2.0 20200110 6908 59 15536:4996 00007FFA110E9B88 00007FFA12B6F1E8
SQL>
SQL> alter session set nls_language = american;
Session altered.
Elapsed: 00:00:00.00
SQL>
SQL> drop table t1;
Table dropped.
Elapsed: 00:00:00.02
SQL>
SQL> purge table t1;
Table purged.
Elapsed: 00:00:00.06
SQL>
SQL> -- Initialize the random generator for "reproducible" pseudo-randomness
SQL> exec dbms_random.seed(0)
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.00
SQL>
SQL> -- ATTR1 and ATTR2 are both skewed and correlated
SQL> create table t1
2 as
3 select
4 rownum as id
5 , trunc(dbms_random.value(1, 1000000000000)) as fk
6 , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
7 , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
8 , rpad('x', 100) as filler
9 from
10 dual
11 connect by
12 level <= 1000000
13 ;
Table created.
Elapsed: 00:00:13.07
SQL>
SQL> -- Histograms on ATTR1 and ATTR2 for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.43
SQL>
SQL> -- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')
PL/SQL procedure successfully completed.
Elapsed: 00:00:02.08
SQL>
SQL> -- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')
PL/SQL procedure successfully completed.
Elapsed: 00:00:02.51
SQL>
SQL> -- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
SQL> -- Officially this is isn't allowed and errors out up to and including 12.1.0.2
SQL> -- ORA-20001: Error when processing extension - missing right parenthesis
SQL> -- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
SQL> -- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
SQL> -- Remove the previous step and it will error out:
SQL> -- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))'); END;
*
ERROR at line 1:
ORA-20001: Error when processing extension - missing right parenthesis
Elapsed: 00:00:00.04
SQL>
SQL> -- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
SQL> -- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
SQL> -- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
SQL> explain plan for
2 select
3 count(*)
4 from
5 t1 a
6 where
7 trunc(attr1) = 1
8 and trunc(attr2) = 1;
Explained.
Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display(format => 'TYPICAL'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 4777 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T1 | 807K| 6305K| 4777 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TRUNC("ATTR1")=1 AND TRUNC("ATTR2")=1)
14 rows selected.
Elapsed: 00:00:00.11
SQL>
SQL> -- But: Explicitly referencing a virtual column doesn't work
SQL> -- This will error out:
SQL> -- ORA-20001: Error when processing extension - virtual column is referenced in a column expression
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.13
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.09
SQL>
SQL> -- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
BEGIN dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))'); END;
*
ERROR at line 1:
ORA-20001: Error when processing extension - missing right parenthesis
Elapsed: 00:00:00.02
SQL>
SQL>
SQL> alter table t1 add (trunc_attr1 as (trunc(attr1)));
Table altered.
Elapsed: 00:00:00.01
SQL>
SQL> alter table t1 add (trunc_attr2 as (trunc(attr2)));
Table altered.
Elapsed: 00:00:00.01
SQL>
SQL> -- This works and is supported in all versions supporting virtual columns
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')
PL/SQL procedure successfully completed.
Elapsed: 00:00:02.57
SQL>
SQL> -- This errors out: ORA-20001: Error when processing extension - virtual column is referenced in a column expression
SQL> -- even in 12.2.0.1 and later
SQL> -- But: Works in 12.2.0.1 and later if the call to
SQL> -- dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
SQL> -- above is removed, because the extension then already exists (!)
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)'); END;
*
ERROR at line 1:
ORA-20001: Error when processing extension - virtual column is referenced in a column expression
ORA-06512: at "SYS.DBMS_STATS", line 34634
ORA-06512: at line 1
Elapsed: 00:00:00.06
SQL>
SQL> -- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))'); END;
*
ERROR at line 1:
ORA-20001: Error when processing extension - missing right parenthesis
Elapsed: 00:00:00.04
SQL>
And that is what I get from 12.2.0.1 on, here using 19.3:
SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 10 12:35:18 2020
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Verbunden mit:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR
-------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ----------------
CBO_TEST orcl19c DELLXPS13 146 48961 19.0.0.0.0 20200110 5648 53 7260:13644 00007FF91687B3D8 00007FF91656B858
SQL>
SQL> alter session set nls_language = american;
Session altered.
Elapsed: 00:00:00.00
SQL>
SQL> drop table t1;
Table dropped.
Elapsed: 00:00:00.02
SQL>
SQL> purge table t1;
Table purged.
Elapsed: 00:00:00.10
SQL>
SQL> -- Initialize the random generator for "reproducible" pseudo-randomness
SQL> exec dbms_random.seed(0)
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.00
SQL>
SQL> -- ATTR1 and ATTR2 are both skewed and correlated
SQL> create table t1
2 as
3 select
4 rownum as id
5 , trunc(dbms_random.value(1, 1000000000000)) as fk
6 , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
7 , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
8 , rpad('x', 100) as filler
9 from
10 dual
11 connect by
12 level <= 1000000
13 ;
Table created.
Elapsed: 00:00:11.89
SQL>
SQL> -- Histograms on ATTR1 and ATTR2 for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.68
SQL>
SQL> -- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')
PL/SQL procedure successfully completed.
Elapsed: 00:00:02.75
SQL>
SQL> -- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')
PL/SQL procedure successfully completed.
Elapsed: 00:00:03.43
SQL>
SQL> -- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
SQL> -- Officially this is isn't allowed and errors out up to and including 12.1.0.2
SQL> -- ORA-20001: Error when processing extension - missing right parenthesis
SQL> -- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
SQL> -- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
SQL> -- Remove the previous step and it will error out:
SQL> -- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
PL/SQL procedure successfully completed.
Elapsed: 00:00:05.24
SQL>
SQL> -- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
SQL> -- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
SQL> -- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
SQL> explain plan for
2 select
3 count(*)
4 from
5 t1 a
6 where
7 trunc(attr1) = 1
8 and trunc(attr2) = 1;
Explained.
Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display(format => 'TYPICAL'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 4797 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T1 | 893K| 6977K| 4797 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TRUNC("ATTR1")=1 AND TRUNC("ATTR2")=1)
14 rows selected.
Elapsed: 00:00:00.10
SQL>
SQL> -- But: Explicitly referencing a virtual column doesn't work
SQL> -- This will error out:
SQL> -- ORA-20001: Error when processing extension - virtual column is referenced in a column expression
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.33
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.33
SQL>
SQL> -- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.11
SQL>
SQL>
SQL> alter table t1 add (trunc_attr1 as (trunc(attr1)));
Table altered.
Elapsed: 00:00:00.01
SQL>
SQL> alter table t1 add (trunc_attr2 as (trunc(attr2)));
Table altered.
Elapsed: 00:00:00.01
SQL>
SQL> -- This works and is supported in all versions supporting virtual columns
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')
PL/SQL procedure successfully completed.
Elapsed: 00:00:03.49
SQL>
SQL> -- This errors out: ORA-20001: Error when processing extension - virtual column is referenced in a column expression
SQL> -- even in 12.2.0.1 and later
SQL> -- But: Works in 12.2.0.1 and later if the call to
SQL> -- dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
SQL> -- above is removed, because the extension then already exists (!)
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)'); END;
*
ERROR at line 1:
ORA-20001: Error when processing extension - virtual column is referenced in a column expression
ORA-06512: at "SYS.DBMS_STATS", line 40751
ORA-06512: at "SYS.DBMS_STATS", line 40035
ORA-06512: at "SYS.DBMS_STATS", line 38912
ORA-06512: at "SYS.DBMS_STATS", line 15726
ORA-06512: at "SYS.DBMS_STATS", line 22064
ORA-06512: at "SYS.DBMS_STATS", line 22162
ORA-06512: at "SYS.DBMS_STATS", line 22232
ORA-06512: at "SYS.DBMS_STATS", line 22864
ORA-06512: at "SYS.DBMS_STATS", line 38313
ORA-06512: at "SYS.DBMS_STATS", line 39738
ORA-06512: at "SYS.DBMS_STATS", line 40183
ORA-06512: at "SYS.DBMS_STATS", line 40732
ORA-06512: at line 1
Elapsed: 00:00:00.09
SQL>
SQL> -- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
PL/SQL procedure successfully completed.
Elapsed: 00:00:05.32
SQL>
So, from 12.2.0.1 it looks like extended statistics on virtual columns are supported to some degree, when following a certain sequence of operation - first creating extended statistics on each of the expressions used, which creates corresponding virtual columns under the cover, and afterwards creating extended statistics using those expressions as part of the column group expression.
Strange enough, when explicitly creating virtual columns for those expressions creating a column group explicitly referencing those virtual columns doesn't work - but using the expressions covered by the virtual columns instead works even with those virtual columns created explicitly.