Oracle related stuff: July 2012

In a couple of days, on Wednesday, 1st of August, I'll be presenting another free webinar hosted at AllThingsOracle.com.

Although it is called "Oracle Cost-Based Optimizer Advanced Session", don't be mislead by the title.

It is not a truly "advanced" session, but rather I'll try to delve into various topics that I could only mention briefly or had to omit completely during the first webinar on the Cost-Based Optimizer.

In principle it's going to be a selection of the most recurring issues that I come across during my consultancy work:

- I'm going to spend some time on statistics and histograms in particular and what I believe are the most important aspects to understand regarding them

- Why the clustering factor of indexes gathered by the statistics collection might be wrong (this was only briefly mentioned in the first webinar)

- Using the wrong datatype and how this impacts the estimates of the optimizer is going to be another potential topic

- Implicit datatype conversions are also an issue I come across quite frequently (Tom Kyte covered this already to some extent in one of the previous AllThingsOracle.com webinars I believe)

- Last but not least Dynamic Sampling, Virtual Columns and Extended Statistics are also worth to talk about if time permits it

This time I'm going to use only a couple of slides and mainly will run through some test cases that hopefully allow a live demonstration of what I'm talking about.

You can find more details and register here.

So you have that application that cannot be changed but makes use of some weird expressions that screw up the cardinality estimates of the optimizer.

Consider this simple example:

create table t
as
select
        rownum         as id
      , case
          when mod(rownum, 100000) = 0
          then 'y'
          else 'n'
        end            as flag
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1000000
;

exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns flag size 2', estimate_percent => 30)

select flag, count(*) from t group by flag;

explain plan for
select count(*) from t where flag = 'y'
;

set linesize 200 tab off pagesize 0

select * from table(dbms_xplan.display);

We have a table with a very skewed "FLAG" column with just a few occurrences of 'y'. The histogram on the column allows the optimizer to recognize this.

Notice that I haven't used the AUTO_SAMPLE_SIZE default, due to the potential problem of 11g in particular when using low sample sizes for a histogram on a column having rare values leading to inconsistent column and histogram statistics - but for larger tables the AUTO_SAMPLE_SIZE in 11g gives you much better basic column statistics, so I don't advise in general to not use AUTO_SAMPLE_SIZE. It might be worth to use a separate GATHER_TABLE_STATS call only for the columns with histograms using an explicit sample size, but using AUTO_SAMPLE_SIZE for the remaining columns.

From running this script on 11.2.0.1 I get this output:

SQL> create table t
  2  as
  3  select
  4          rownum         as id
  5        , case
  6            when mod(rownum, 100000) = 0
  7            then 'y'
  8            else 'n'
  9          end            as flag
 10        , rpad('x', 100) as filler
 11  from
 12          dual
 13  connect by
 14          level <= 1000000
 15  ;

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns flag size 2', estimate_percent => 30)

PL/SQL procedure successfully completed.

SQL>
SQL> select flag, count(*) from t group by flag;

F   COUNT(*)
- ----------
y         10
n     999990

SQL>
SQL> explain plan for
  2  select count(*) from t where flag = 'y'
  3  ;

Explained.

SQL>
SQL> set linesize 200 tab off pagesize 0
SQL>
SQL> select * from table(dbms_xplan.display);
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2 |  4292   (2)| 00:00:52 |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |     8 |    16 |  4292   (2)| 00:00:52 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("FLAG"='y')

14 rows selected.

So the cardinality estimate for the FLAG column is in the right ballpark as I've prevented problems with the histogram. But now we have this expression, so let's see what happens then to the estimates:

explain plan for
select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
;

select * from table(dbms_xplan.display);

which gives me this output from my 11.2.0.1 session:

SQL> explain plan for
  2  select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
  3  ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2 |  4308   (2)| 00:00:52 |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  9992 | 19984 |  4308   (2)| 00:00:52 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(NVL(UPPER("FLAG"),'Y')='Y')

14 rows selected.

SQL>

As expected, the optimizer doesn't have a clue and falls back to a hardcoded one percent default estimate.

Since this is 11g, let's generate extended statistics for the expression with a histogram and re-check the estimates:

exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for columns (nvl(upper(flag), ''Y'')) size 2', estimate_percent => 30)

explain plan for
select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
;

select * from table(dbms_xplan.display);

which gives me

SQL> exec dbms_stats.gather_table_stats(null, 't', method_opt => 'for all columns size 1 for columns flag size 2 for columns (nvl(upper(flag), ''Y'')) size 2', estimate_percent => 30)

PL/SQL procedure successfully completed.

SQL>
SQL> explain plan for
  2  select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
  3  ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |  4308   (2)| 00:00:52 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |     8 |    32 |  4308   (2)| 00:00:52 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(NVL(UPPER("FLAG"),'Y')='Y')

14 rows selected.

So that looks good again. Now let's combine this with CURSOR_SHARING = FORCE:

alter session set cursor_sharing = force;

explain plan for
select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
;

select * from table(dbms_xplan.display);

select count(*) from t where nvl(upper(flag), 'Y') = 'Y';

select * from table(dbms_xplan.display_cursor);

Notice in particular the EXPLAIN PLAN output and the actual execution plan used at runtime:

SQL> alter session set cursor_sharing = force;

Session altered.

SQL>
SQL> explain plan for
  2  select count(*) from t where nvl(upper(flag), 'Y') = 'Y'
  3  ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |  4308   (2)| 00:00:52 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |     8 |    32 |  4308   (2)| 00:00:52 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(NVL(UPPER("FLAG"),'Y')='Y')

14 rows selected.

SQL>
SQL> select count(*) from t where nvl(upper(flag), 'Y') = 'Y';
        10

SQL>
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID  8jzw0jwwuk83f, child number 0
-------------------------------------
select count(*) from t where nvl(upper(flag), :"SYS_B_0") = :"SYS_B_1"

Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  4308 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  9998 | 19996 |  4308   (2)| 00:00:52 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(NVL(UPPER("FLAG"),:SYS_B_0)=:SYS_B_1)


19 rows selected.

So there are two important takeaways:

1. EXPLAIN PLAN doesn't care about CURSOR_SHARING=FORCE. You have to be very careful when CURSOR_SHARING=FORCE is active regarding the execution plans you see from EXPLAIN PLAN and the actual ones used at runtime. This also means that the AUTOTRACE feature of SQL*Plus might lie to you, too.

2. The literal replacement performed by CURSOR_SHARING=FORCE prevents the optimizer from making use of the Virtual Column / Extended Statistics, because the expression has changed, hence we are back to the bad cardinality estimate and the additional information is useless to the optimizer in this scenario

Footnote

If you really need to use CURSOR_SHARING=FORCE due to some badly written application, I always advise to minimize the scope. If feasible, don't run the whole instance in that mode, but limit the setting to either the corresponding sessions (for example, via LOGON triggers) or even to certain processing parts of the application (via corresponding ALTER SESSION SET CURSOR_SHARING = FORCE/EXACT calls).

Note that all this applies to CURSOR_SHARING=SIMILAR, too, but since its use is deprecated anyway (see MOS document 1169017.1) I haven't mentioned it here before.

Finally, related technologies like Function Based Indexes and Virtual Columns are very likely affected in a similar way regarding the resulting cardinality estimates.

Oracle related stuff

Pages

Saturday, July 28, 2012

Interesting blog notes

Sunday, July 22, 2012

Free Webinar

Wednesday, July 11, 2012

AllThingsOracle.com - Dynamic Sampling (III) Part II

Sunday, July 8, 2012

Forced Cursor Sharing And Virtual Columns

Blog Archive

Homepage

YouTube Channel

About Me

Contact Details

OakTable Book "Expert Oracle Practices"

Blogroll