So you have that application that cannot be changed but makes use of some weird
expressions that screw up the
cardinality estimates of the optimizer.
Consider this simple example:
create table t
as
select
rownum as id
, case
when mod(rownum, 100000) = 0
then 'y'
else 'n'
end as flag
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;
exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns flag size 2', estimate_percent => 30)
select flag, count(*) from t group by flag;
explain plan for
select count(*) from t where flag = 'y'
;
set linesize 200 tab off pagesize 0
select * from table(dbms_xplan.display);
We have a table with a very
skewed "FLAG" column with just a few occurrences of 'y'. The
histogram on the column allows the optimizer to recognize this.
Notice that I haven't used the
AUTO_SAMPLE_SIZE default, due to the potential problem of 11g in particular when using
low sample sizes for a histogram on a column having rare values
leading to inconsistent column and histogram statistics - but for larger tables the AUTO_SAMPLE_SIZE in 11g gives you much better basic column statistics, so I don't advise in general to not use AUTO_SAMPLE_SIZE. It might be worth to use a
separate GATHER_TABLE_STATS call only for the columns with histograms using an explicit sample size, but using AUTO_SAMPLE_SIZE for the remaining columns.
From running this script on 11.2.0.1 I get this output:
SQL> create table t
2 as
3 select
4 rownum as id
5 , case
6 when mod(rownum, 100000) = 0
7 then 'y'
8 else 'n'
9 end as flag
10 , rpad('x', 100) as filler
11 from
12 dual
13 connect by
14 level <= 1000000
15 ;
Table created.
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns flag size 2', estimate_percent => 30)
PL/SQL procedure successfully completed.
SQL>
SQL> select flag, count(*) from t group by flag;
F COUNT(*)
- ----------
y 10
n 999990
SQL>
SQL> explain plan for
2 select count(*) from t where flag = 'y'
3 ;
Explained.
SQL>
SQL> set linesize 200 tab off pagesize 0
SQL>
SQL> select * from table(dbms_xplan.display);
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 | 4292 (2)| 00:00:52 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
|* 2 | TABLE ACCESS FULL| T | 8 | 16 | 4292 (2)| 00:00:52 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("FLAG"='y')
14 rows selected.
So the cardinality estimate for the FLAG column is in the
right ballpark as I've prevented problems with the histogram. But now we have this expression, so let's see what happens then to the estimates:
explain plan for
select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
;
select * from table(dbms_xplan.display);
which gives me this output from my 11.2.0.1 session:
SQL> explain plan for
2 select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
3 ;
Explained.
SQL>
SQL> select * from table(dbms_xplan.display);
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 | 4308 (2)| 00:00:52 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
|* 2 | TABLE ACCESS FULL| T | 9992 | 19984 | 4308 (2)| 00:00:52 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL(UPPER("FLAG"),'Y')='Y')
14 rows selected.
SQL>
As expected, the optimizer doesn't have a clue and falls back to a
hardcoded one percent default estimate.
Since this is 11g, let's generate extended statistics for the expression with a histogram and re-check the estimates:
exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for columns (nvl(upper(flag), ''Y'')) size 2', estimate_percent => 30)
explain plan for
select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
;
select * from table(dbms_xplan.display);
which gives me
SQL> exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns flag size 2 for columns (nvl(upper(flag), ''Y'')) size 2', estimate_percent => 30)
PL/SQL procedure successfully completed.
SQL>
SQL> explain plan for
2 select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
3 ;
Explained.
SQL>
SQL> select * from table(dbms_xplan.display);
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 4308 (2)| 00:00:52 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T | 8 | 32 | 4308 (2)| 00:00:52 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL(UPPER("FLAG"),'Y')='Y')
14 rows selected.
So that looks good again. Now let's combine this with
CURSOR_SHARING = FORCE:
alter session set cursor_sharing = force;
explain plan for
select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
;
select * from table(dbms_xplan.display);
select count(*) from t where nvl(upper(flag), 'Y') = 'Y';
select * from table(dbms_xplan.display_cursor);
Notice in particular the
EXPLAIN PLAN output and the
actual execution plan used at runtime:
SQL> alter session set cursor_sharing = force;
Session altered.
SQL>
SQL> explain plan for
2 select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
3 ;
Explained.
SQL>
SQL> select * from table(dbms_xplan.display);
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 4308 (2)| 00:00:52 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T | 8 | 32 | 4308 (2)| 00:00:52 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL(UPPER("FLAG"),'Y')='Y')
14 rows selected.
SQL>
SQL> select count(*) from t where nvl(upper(flag), 'Y') = 'Y';
10
SQL>
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID 8jzw0jwwuk83f, child number 0
-------------------------------------
select count(*) from t where nvl(upper(flag), :"SYS_B_0") = :"SYS_B_1"
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4308 (100)| |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
|* 2 | TABLE ACCESS FULL| T | 9998 | 19996 | 4308 (2)| 00:00:52 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL(UPPER("FLAG"),:SYS_B_0)=:SYS_B_1)
19 rows selected.
So there are two important takeaways:
1. EXPLAIN PLAN
doesn't care about CURSOR_SHARING=FORCE. You have to be very careful when CURSOR_SHARING=FORCE is active regarding the execution plans you see from EXPLAIN PLAN and the
actual ones used at runtime. This also means that the
AUTOTRACE feature of SQL*Plus might lie to you, too.
2. The
literal replacement performed by CURSOR_SHARING=FORCE prevents the optimizer from making use of the Virtual Column / Extended Statistics, because the expression has changed, hence we are back to the bad cardinality estimate and the additional information is
useless to the optimizer in this scenario
Footnote
If you really need to use CURSOR_SHARING=FORCE due to some badly written application, I always advise to
minimize the scope. If feasible, don't run the whole instance in that mode, but limit the setting to either the corresponding sessions (for example, via LOGON triggers) or even to certain processing parts of the application (via corresponding ALTER SESSION SET CURSOR_SHARING = FORCE/EXACT calls).
Note that all this applies to
CURSOR_SHARING=SIMILAR, too, but since its use is
deprecated anyway (see MOS document 1169017.1) I haven't mentioned it here before.
Finally, related technologies like
Function Based Indexes and
Virtual Columns are very likely affected in a similar way regarding the resulting cardinality estimates.