Surprisingly, when following a certain sequence of operation, this starts working (to some degree) from 12.2.0.1 on.
The official documentation up to and including 19c still mentions this as a restriction, and since it doesn't work when explicitly referencing virtual columns (see the test case what I exactly mean by this) I assume this is more like a side effect / unintended feature.
Nevertheless, the optimizer happily picks up this additional information and comes up with improved estimates when having a combination of skew and correlation on expressions, for example.
The following test case shows the change in behaviour from 12.2.0.1 on:
set echo on linesize 200 trimspool on trimout on tab off pagesize 999 timing on alter session set nls_language = american; drop table t1; purge table t1; -- Initialize the random generator for "reproducible" pseudo-randomness exec dbms_random.seed(0) -- ATTR1 and ATTR2 are both skewed and correlated create table t1 as select rownum as id , trunc(dbms_random.value(1, 1000000000000)) as fk , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1 , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2 , rpad('x', 100) as filler from dual connect by level <= 1000000 ; -- Histograms on ATTR1 and ATTR2 for representing skew properly exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2') -- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)') -- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))') -- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions -- Officially this is isn't allowed and errors out up to and including 12.1.0.2 -- ORA-20001: Error when processing extension - missing right parenthesis -- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c -- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column -- Remove the previous step and it will error out: -- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))') -- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly -- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate -- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct explain plan for select count(*) from t1 a where trunc(attr1) = 1 and trunc(attr2) = 1; select * from table(dbms_xplan.display(format => 'TYPICAL')); -- But: Explicitly referencing a virtual column doesn't work -- This will error out: -- ORA-20001: Error when processing extension - virtual column is referenced in a column expression exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))') exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))') -- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call) exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))') alter table t1 add (trunc_attr1 as (trunc(attr1))); alter table t1 add (trunc_attr2 as (trunc(attr2))); -- This works and is supported in all versions supporting virtual columns exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2') -- This errors out: ORA-20001: Error when processing extension - virtual column is referenced in a column expression -- even in 12.2.0.1 and later -- But: Works in 12.2.0.1 and later if the call to -- dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))') -- above is removed, because the extension then already exists (!) exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)') -- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
And here is the output I get when using 12.1.0.2:
SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 10 12:39:00 2020 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR -------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ---------------- CBO_TEST orcl121 DELLXPS13 368 46472 12.1.0.2.0 20200110 6908 59 15536:4996 00007FFA110E9B88 00007FFA12B6F1E8 SQL> SQL> alter session set nls_language = american; Session altered. Elapsed: 00:00:00.00 SQL> SQL> drop table t1; Table dropped. Elapsed: 00:00:00.02 SQL> SQL> purge table t1; Table purged. Elapsed: 00:00:00.06 SQL> SQL> -- Initialize the random generator for "reproducible" pseudo-randomness SQL> exec dbms_random.seed(0) PL/SQL procedure successfully completed. Elapsed: 00:00:00.00 SQL> SQL> -- ATTR1 and ATTR2 are both skewed and correlated SQL> create table t1 2 as 3 select 4 rownum as id 5 , trunc(dbms_random.value(1, 1000000000000)) as fk 6 , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1 7 , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2 8 , rpad('x', 100) as filler 9 from 10 dual 11 connect by 12 level <= 1000000 13 ; Table created. Elapsed: 00:00:13.07 SQL> SQL> -- Histograms on ATTR1 and ATTR2 for representing skew properly SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2') PL/SQL procedure successfully completed. Elapsed: 00:00:01.43 SQL> SQL> -- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)') PL/SQL procedure successfully completed. Elapsed: 00:00:02.08 SQL> SQL> -- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))') PL/SQL procedure successfully completed. Elapsed: 00:00:02.51 SQL> SQL> -- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions SQL> -- Officially this is isn't allowed and errors out up to and including 12.1.0.2 SQL> -- ORA-20001: Error when processing extension - missing right parenthesis SQL> -- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c SQL> -- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column SQL> -- Remove the previous step and it will error out: SQL> -- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))') BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))'); END; * ERROR at line 1: ORA-20001: Error when processing extension - missing right parenthesis Elapsed: 00:00:00.04 SQL> SQL> -- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly SQL> -- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate SQL> -- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct SQL> explain plan for 2 select 3 count(*) 4 from 5 t1 a 6 where 7 trunc(attr1) = 1 8 and trunc(attr2) = 1; Explained. Elapsed: 00:00:00.02 SQL> SQL> select * from table(dbms_xplan.display(format => 'TYPICAL')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 3724264953 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 8 | 4777 (1)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 8 | | | |* 2 | TABLE ACCESS FULL| T1 | 807K| 6305K| 4777 (1)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(TRUNC("ATTR1")=1 AND TRUNC("ATTR2")=1) 14 rows selected. Elapsed: 00:00:00.11 SQL> SQL> -- But: Explicitly referencing a virtual column doesn't work SQL> -- This will error out: SQL> -- ORA-20001: Error when processing extension - virtual column is referenced in a column expression SQL> SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))') PL/SQL procedure successfully completed. Elapsed: 00:00:00.13 SQL> SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))') PL/SQL procedure successfully completed. Elapsed: 00:00:00.09 SQL> SQL> -- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call) SQL> exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))') BEGIN dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))'); END; * ERROR at line 1: ORA-20001: Error when processing extension - missing right parenthesis Elapsed: 00:00:00.02 SQL> SQL> SQL> alter table t1 add (trunc_attr1 as (trunc(attr1))); Table altered. Elapsed: 00:00:00.01 SQL> SQL> alter table t1 add (trunc_attr2 as (trunc(attr2))); Table altered. Elapsed: 00:00:00.01 SQL> SQL> -- This works and is supported in all versions supporting virtual columns SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2') PL/SQL procedure successfully completed. Elapsed: 00:00:02.57 SQL> SQL> -- This errors out: ORA-20001: Error when processing extension - virtual column is referenced in a column expression SQL> -- even in 12.2.0.1 and later SQL> -- But: Works in 12.2.0.1 and later if the call to SQL> -- dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))') SQL> -- above is removed, because the extension then already exists (!) SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)') BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)'); END; * ERROR at line 1: ORA-20001: Error when processing extension - virtual column is referenced in a column expression ORA-06512: at "SYS.DBMS_STATS", line 34634 ORA-06512: at line 1 Elapsed: 00:00:00.06 SQL> SQL> -- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))') BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))'); END; * ERROR at line 1: ORA-20001: Error when processing extension - missing right parenthesis Elapsed: 00:00:00.04 SQL>
And that is what I get from 12.2.0.1 on, here using 19.3:
SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 10 12:35:18 2020 Copyright (c) 1982, 2010, Oracle. All rights reserved. Verbunden mit: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR -------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ---------------- CBO_TEST orcl19c DELLXPS13 146 48961 19.0.0.0.0 20200110 5648 53 7260:13644 00007FF91687B3D8 00007FF91656B858 SQL> SQL> alter session set nls_language = american; Session altered. Elapsed: 00:00:00.00 SQL> SQL> drop table t1; Table dropped. Elapsed: 00:00:00.02 SQL> SQL> purge table t1; Table purged. Elapsed: 00:00:00.10 SQL> SQL> -- Initialize the random generator for "reproducible" pseudo-randomness SQL> exec dbms_random.seed(0) PL/SQL procedure successfully completed. Elapsed: 00:00:00.00 SQL> SQL> -- ATTR1 and ATTR2 are both skewed and correlated SQL> create table t1 2 as 3 select 4 rownum as id 5 , trunc(dbms_random.value(1, 1000000000000)) as fk 6 , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1 7 , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2 8 , rpad('x', 100) as filler 9 from 10 dual 11 connect by 12 level <= 1000000 13 ; Table created. Elapsed: 00:00:11.89 SQL> SQL> -- Histograms on ATTR1 and ATTR2 for representing skew properly SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2') PL/SQL procedure successfully completed. Elapsed: 00:00:01.68 SQL> SQL> -- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)') PL/SQL procedure successfully completed. Elapsed: 00:00:02.75 SQL> SQL> -- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))') PL/SQL procedure successfully completed. Elapsed: 00:00:03.43 SQL> SQL> -- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions SQL> -- Officially this is isn't allowed and errors out up to and including 12.1.0.2 SQL> -- ORA-20001: Error when processing extension - missing right parenthesis SQL> -- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c SQL> -- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column SQL> -- Remove the previous step and it will error out: SQL> -- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))') PL/SQL procedure successfully completed. Elapsed: 00:00:05.24 SQL> SQL> -- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly SQL> -- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate SQL> -- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct SQL> explain plan for 2 select 3 count(*) 4 from 5 t1 a 6 where 7 trunc(attr1) = 1 8 and trunc(attr2) = 1; Explained. Elapsed: 00:00:00.01 SQL> SQL> select * from table(dbms_xplan.display(format => 'TYPICAL')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 3724264953 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 8 | 4797 (1)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 8 | | | |* 2 | TABLE ACCESS FULL| T1 | 893K| 6977K| 4797 (1)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(TRUNC("ATTR1")=1 AND TRUNC("ATTR2")=1) 14 rows selected. Elapsed: 00:00:00.10 SQL> SQL> -- But: Explicitly referencing a virtual column doesn't work SQL> -- This will error out: SQL> -- ORA-20001: Error when processing extension - virtual column is referenced in a column expression SQL> SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))') PL/SQL procedure successfully completed. Elapsed: 00:00:00.33 SQL> SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))') PL/SQL procedure successfully completed. Elapsed: 00:00:00.33 SQL> SQL> -- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call) SQL> exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))') PL/SQL procedure successfully completed. Elapsed: 00:00:00.11 SQL> SQL> SQL> alter table t1 add (trunc_attr1 as (trunc(attr1))); Table altered. Elapsed: 00:00:00.01 SQL> SQL> alter table t1 add (trunc_attr2 as (trunc(attr2))); Table altered. Elapsed: 00:00:00.01 SQL> SQL> -- This works and is supported in all versions supporting virtual columns SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2') PL/SQL procedure successfully completed. Elapsed: 00:00:03.49 SQL> SQL> -- This errors out: ORA-20001: Error when processing extension - virtual column is referenced in a column expression SQL> -- even in 12.2.0.1 and later SQL> -- But: Works in 12.2.0.1 and later if the call to SQL> -- dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))') SQL> -- above is removed, because the extension then already exists (!) SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)') BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)'); END; * ERROR at line 1: ORA-20001: Error when processing extension - virtual column is referenced in a column expression ORA-06512: at "SYS.DBMS_STATS", line 40751 ORA-06512: at "SYS.DBMS_STATS", line 40035 ORA-06512: at "SYS.DBMS_STATS", line 38912 ORA-06512: at "SYS.DBMS_STATS", line 15726 ORA-06512: at "SYS.DBMS_STATS", line 22064 ORA-06512: at "SYS.DBMS_STATS", line 22162 ORA-06512: at "SYS.DBMS_STATS", line 22232 ORA-06512: at "SYS.DBMS_STATS", line 22864 ORA-06512: at "SYS.DBMS_STATS", line 38313 ORA-06512: at "SYS.DBMS_STATS", line 39738 ORA-06512: at "SYS.DBMS_STATS", line 40183 ORA-06512: at "SYS.DBMS_STATS", line 40732 ORA-06512: at line 1 Elapsed: 00:00:00.09 SQL> SQL> -- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))') PL/SQL procedure successfully completed. Elapsed: 00:00:05.32 SQL>
So, from 12.2.0.1 it looks like extended statistics on virtual columns are supported to some degree, when following a certain sequence of operation - first creating extended statistics on each of the expressions used, which creates corresponding virtual columns under the cover, and afterwards creating extended statistics using those expressions as part of the column group expression.
Strange enough, when explicitly creating virtual columns for those expressions creating a column group explicitly referencing those virtual columns doesn't work - but using the expressions covered by the virtual columns instead works even with those virtual columns created explicitly.
Hi Randolf
ReplyDeleteNotice that the current 19c documentation that you reference contains the following sentence "The extension can contain an expression only if a corresponding virtual column has been created."
So, they tried to fix it but they did it incorrectly ;-)
Cheers,
Chris