Oracle related stuff: January 2009

Wednesday, January 28, 2009

Common tuning issues - avoiding the "fetch first rows quickly" trap

It seems to be a commonly encountered issue that as part of the analysis of a potentially poor performing SQL query statement the investigation is mislead by the fact that running the query in a GUI database front-end tool like TOAD, SQLDeveloper or similar shows an instant reply.

Most of these tools by default fetch only the first n rows of the result set conveniently into a grid-like GUI component.

This might lead to the conclusion that the statement seems to perform well and therefore doesn't require further investigations respectively tuning efforts.

The potential problem with this approach is that due to several possible causes a statement might return the first rows very quickly, but depending on how the statement is actually used this doesn't necessarily mean that all rows will be processed in a efficient manner.

Usually the following circumstances can lead to such side-effects:

- Grossly underestimated cardinalities by the optimizer
- An efficient access path available to the optimizer based on that wrong cardinality estimate, usually an index

There are several scenarios that can be the root cause of the underestimated cardinalities, among them are:

- Wrong statistics of involved tables or indexes
- Sharing of child cursors (execution plans) when using bind variables that cause different cardinalities, can also be forced indirectly by CURSOR_SHARING=FORCE
- Incorrect estimation of join cardinalities: There are simply scenarios where the optimizer is not able to come up with a good estimate of a join, even (or possibly because of) with the help of histograms

Although starting from 10g on there is a default statistics collection job which should help to prevent the first point on the list from happening, it's actually this default job in conjunction with a particular pattern of database usage that can lead exactly to this result.

The following test case run on Oracle 10g XE should simulate this pattern to demonstrate one of the possible scenarios.

It creates table which represents a "staging" table of a bulk load process. This "staging" table is accidentally empty during the night when the default statistics collection job is executed. Afterwards the table gets loaded with 1,500,000 rows and a query is executed without refreshing the statistics after the load is complete.

You'll see that based on the wrong statistics the optimizer comes to a wrong conclusion.

The interesting side effect of this is that the query performs very well in a front-end tool that fetches only the first n rows, since the plan accidentally corresponds to a very efficient "first_rows(n)" plan, although the optimizer_mode is ALL_ROWS by default in 10g.

I've carefully crafted this example to show that even the deliberate usage of an "ORDER BY" in an attempt to force the database to process the whole result set for sorting before returning the first rows can be misleading, since it's possible to process such a query without a sort operation in case a suitable index exists.

Only by definitely forcing the database to process all rows you get the real picture if you know that all rows from your statement actually need to be processed. But it's not that easy to force this; the cost based optimizer is smart and most of the time simply wrapping the statement in a "COUNT(*) from (...)" is again misleading, because the optimizer often is able to avoid the random row access to the table, which is where most of the time is usually spent under such circumstances.

Let's start with setting up the environment:

SQL> 
SQL> show parameter optimizer_mode

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_mode                       string      ALL_ROWS
SQL> 
SQL> drop table t_fetch_first_rows purge;

Table dropped.

Elapsed: 00:00:00.84
SQL> 
SQL> create table t_fetch_first_rows (
  2  id number not null,
  3  name varchar2(30) not null,
  4  type varchar2(30) not null,
  5  measure number
  6  );

Table created.

Elapsed: 00:00:00.09
SQL> 
SQL> create index idx_fetch_first_rows on t_fetch_first_rows (type, id);

Index created.

Elapsed: 00:00:00.03
SQL> 
SQL> -- Simulate an "empty" staging table analyzed by the nightly default
SQL> -- statistics collection job
SQL> exec dbms_stats.gather_table_stats(null, 't_fetch_first_rows')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.29
SQL> 
SQL> select num_rows, blocks
  2  from user_tab_statistics
  3  where table_name = 'T_FETCH_FIRST_ROWS';

  NUM_ROWS     BLOCKS
---------- ----------
         0          0

Elapsed: 00:00:00.01
SQL> 
SQL> select blevel, leaf_blocks, distinct_keys, num_rows, clustering_factor
  2  from user_ind_statistics
  3  where index_name = 'IDX_FETCH_FIRST_ROWS';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR
---------- ----------- ------------- ---------- -----------------
         0           0             0          0                 0

Elapsed: 00:00:00.03
SQL> 
SQL> -- Load the data, the object type is deliberately skewed
SQL> -- Most of the values are "VIEW"
SQL> insert /*+ append */ into t_fetch_first_rows (
  2  id,
  3  name,
  4  type,
  5  measure)
  6  select object_id, object_name, object_type, object_id as measure
  7  from all_objects, (select level as id from dual connect by level <= 1000) dup
  8  where object_type in ('VIEW', 'SCHEDULE')
  9  and rownum <= 1500000;

1500000 rows created.

Elapsed: 00:00:39.36
SQL> 
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> 
SQL> column segment_name format a20
SQL> 
SQL> select segment_name, bytes, blocks, extents
  2  from user_segments
  3  where segment_name = 'T_FETCH_FIRST_ROWS'
  4  or segment_name = 'IDX_FETCH_FIRST_ROWS';

SEGMENT_NAME              BYTES     BLOCKS    EXTENTS
-------------------- ---------- ---------- ----------
T_FETCH_FIRST_ROWS     62914560       7680         75
IDX_FETCH_FIRST_ROWS   31457280       3840         45

Elapsed: 00:00:00.31
SQL> 
SQL> select type, count(*) from t_fetch_first_rows
  2  group by type
  3  order by 2 desc;

TYPE                             COUNT(*)
------------------------------ ----------
VIEW                              1499497
SCHEDULE                              503

Elapsed: 00:00:01.37
SQL> 

The "empty" staging table has been analyzed and afterwards 1,500,000 rows have been loaded which represent approx. 60MB of data, supported by a 30MB index. The table consists of 7,680 blocks (8KB default block size), the index has 3,840 blocks.

Note that the data is skewed, most of entries have a object type of "VIEW", so filtering for this value will return almost all of the rows from the table.

SQL> explain plan for
  2  select * from t_fetch_first_rows
  3  where type = 'VIEW'
  4  order by id;

Explained.

Elapsed: 00:00:00.01
SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3754482025

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |     1 |    60 |     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_FETCH_FIRST_ROWS   |     1 |    60 |     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_FETCH_FIRST_ROWS |     1 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TYPE"='VIEW')
       filter("TYPE"='VIEW')

15 rows selected.

Elapsed: 00:00:00.03
SQL> 
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:18.95
SQL> 
SQL> set autotrace traceonly
SQL> 
SQL> -- Force to fetch all rows
SQL> -- Note that COUNT(*) itself is not sufficient
SQL> -- because the optimizer is smart and doesn't visit the table
SQL> -- in this case which means that the test case would be no longer
SQL> -- representative
SQL> select count(*) from (
  2  select * from t_fetch_first_rows
  3  where type = 'VIEW'
  4  order by id
  5  );

Elapsed: 00:00:05.92

Execution Plan
----------------------------------------------------------
Plan hash value: 3912594726

------------------------------------------------------------------------------------------
| Id  | Operation         | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                      |     1 |    17 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |                      |     1 |    17 |            |          |
|*  2 |   INDEX RANGE SCAN| IDX_FETCH_FIRST_ROWS |     1 |    17 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TYPE"='VIEW')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3758  consistent gets
       3758  physical reads
          0  redo size
        414  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 
SQL> set autotrace off
SQL> 

Running above query including a filter predicate and an ORDER BY clause will return almost immediately, suggesting good response time.

And by simply wrapping the query in a "SELECT COUNT(*) FROM (...)" block it seems the result confirms that the query performs good: Only 3,758 blocks read (by the way, if you check the execution plan, it's an index only execution and not visiting the table at all - the blocks read basically represent all blocks from the index) in five seconds. So obviously there is nothing wrong...

SQL> 
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.01
SQL> -- alter session set tracefile_identifier = 'fetch_first_rows1';
SQL> 
SQL> -- exec dbms_monitor.session_trace_enable
SQL> 
SQL> set autotrace traceonly
SQL> 
SQL> -- Force to fetch all rows
SQL> -- Note that COUNT(*) itself is not sufficient
SQL> -- because the optimizer is smart and doesn't visit the table
SQL> -- in this case which means that the test case would be no longer
SQL> -- representative
SQL> select sum(measure), count(*) from (
  2  select * from t_fetch_first_rows
  3  where type = 'VIEW'
  4  order by id
  5  );

Elapsed: 00:01:04.31

Execution Plan
----------------------------------------------------------
Plan hash value: 1903859112

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |     1 |    30 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |                      |     1 |    30 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_FETCH_FIRST_ROWS   |     1 |    30 |     0   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IDX_FETCH_FIRST_ROWS |     1 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("TYPE"='VIEW')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1503256  consistent gets
      11236  physical reads
         72  redo size
        483  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 
SQL> set autotrace off
SQL> 
SQL> -- exec dbms_monitor.session_trace_disable
SQL> 

But look what happens if I force to process all rows in a realistic manner: It takes roughly a minute and more significantly approx. 1,500,000 consistent block gets to process a table that consists of 7,680 blocks ! That's a logical I/O amount representing roughly 12GB.

And if you check the AUTOTRACE statistics you'll notice that I was actually lucky: I only read 11,236 blocks physically, the vast majority of the blocks came from the buffer cache after all blocks of the table have been read into it. In a real production system where the segments might be much larger and potentially multiple of such statements might compete for the buffer cache simultaneously a lot more buffer gets might lead to real physical I/O making things even worse.

The impressive amount of logical I/O is caused by the fact that using this execution plan the database scanned the index leaf blocks for the rows according to the filter criteria and accessed the corresponding table in a random fashion.

But the smallest entity that the database knows in terms of I/O is a block, so in order to read a single random row the whole block where the row resides in needs to be read. Due to the bad clustering factor of the index, which means that the table rows are differently ordered than the index, for each row randomly accessed a different block of the table had to be read. So on average each block of the table was accessed almost 200 times during the process!

If the table and the index were ordered in the same way (represented by a low clustering factor of the index close to number of blocks of the table, whereas a bad clustering factor is close to the number of rows of the table) then the amount of logical I/O required could be significantly less, since the rows accessed randomly reside in the same block and therefore the same table block can be re-used to process the row.

Now I'm going to show that having representative index statistics is as important as having the tables properly analyzed:

SQL> -- Relevance of index stats: Gather proper table statistics but without indexes
SQL> exec dbms_stats.gather_table_stats(null, 't_fetch_first_rows', cascade => false, no_invalidate=>false)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.40
SQL> 
SQL> select num_rows, blocks
  2  from user_tab_statistics
  3  where table_name = 'T_FETCH_FIRST_ROWS';

  NUM_ROWS     BLOCKS
---------- ----------
   1479394       7606

Elapsed: 00:00:00.03
SQL> 
SQL> select blevel, leaf_blocks, distinct_keys, num_rows, clustering_factor
  2  from user_ind_statistics
  3  where index_name = 'IDX_FETCH_FIRST_ROWS';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR
---------- ----------- ------------- ---------- -----------------
         0           0             0          0                 0

Elapsed: 00:00:00.04
SQL> 
SQL> explain plan for
  2  select * from t_fetch_first_rows
  3  where type = 'VIEW'
  4  order by id;

Explained.

Elapsed: 00:00:00.00
SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3754482025

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |  1478K|    42M|     0   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_FETCH_FIRST_ROWS   |  1478K|    42M|     0   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_FETCH_FIRST_ROWS |  1478K|       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TYPE"='VIEW')
       filter("TYPE"='VIEW')

15 rows selected.

Elapsed: 00:00:00.03
SQL> 
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.18
SQL> 
SQL> -- alter session set tracefile_identifier = 'fetch_first_rows2';
SQL> 
SQL> -- exec dbms_monitor.session_trace_enable
SQL> 
SQL> set autotrace traceonly
SQL> 
SQL> -- Force to fetch all rows
SQL> -- Note that COUNT(*) itself is not sufficient
SQL> -- because the optimizer is smart and doesn't visit the table
SQL> -- in this case which means that the test case would be no longer
SQL> -- representative
SQL> select sum(measure), count(*) from (
  2  select * from t_fetch_first_rows
  3  where type = 'VIEW'
  4  order by id
  5  );

Elapsed: 00:01:03.90

Execution Plan
----------------------------------------------------------
Plan hash value: 1903859112

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |     1 |    11 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |                      |     1 |    11 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_FETCH_FIRST_ROWS   |  1478K|    15M|     0   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IDX_FETCH_FIRST_ROWS |  1478K|       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("TYPE"='VIEW')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1503255  consistent gets
      11236  physical reads
          0  redo size
        483  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 
SQL> set autotrace off
SQL> 
SQL> -- exec dbms_monitor.session_trace_disable
SQL> 

Although I got now proper table statistics, the optimizer is still fooled by the incorrect index statistics and again chooses the inefficient index access.

Gathering proper statistics on both table and index leads to a better decision of the optimizer:

SQL> -- Now gather proper statistics
SQL> exec dbms_stats.gather_table_stats(null, 't_fetch_first_rows', cascade => true, no_invalidate=>false)

PL/SQL procedure successfully completed.

Elapsed: 00:00:13.40
SQL> 
SQL> select num_rows, blocks
  2  from user_tab_statistics
  3  where table_name = 'T_FETCH_FIRST_ROWS';

  NUM_ROWS     BLOCKS
---------- ----------
   1504130       7606

Elapsed: 00:00:00.01
SQL> 
SQL> select blevel, leaf_blocks, distinct_keys, num_rows, clustering_factor
  2  from user_ind_statistics
  3  where index_name = 'IDX_FETCH_FIRST_ROWS';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR
---------- ----------- ------------- ---------- -----------------
         2        3757          2984    1500000           1500000

Elapsed: 00:00:00.03
SQL> 
SQL> explain plan for
  2  select * from t_fetch_first_rows
  3  where type = 'VIEW'
  4  order by id;

Explained.

Elapsed: 00:00:00.01
SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2749563054

-------------------------------------------------------------------------------------------------
| Id  | Operation          | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                    |  1503K|    43M|       | 14713   (4)| 00:02:57 |
|   1 |  SORT ORDER BY     |                    |  1503K|    43M|   127M| 14713   (4)| 00:02:57 |
|*  2 |   TABLE ACCESS FULL| T_FETCH_FIRST_ROWS |  1503K|    43M|       |  2159   (5)| 00:00:26 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TYPE"='VIEW')

14 rows selected.

Elapsed: 00:00:00.01
SQL> 
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.26
SQL> 
SQL> -- alter session set tracefile_identifier = 'fetch_first_rows3';
SQL> 
SQL> -- exec dbms_monitor.session_trace_enable
SQL> 
SQL> set autotrace traceonly
SQL> 
SQL> -- Force to fetch all rows
SQL> -- Note that COUNT(*) itself is not sufficient
SQL> -- because the optimizer is smart and doesn't visit the table
SQL> -- in this case which means that the test case would be no longer
SQL> -- representative
SQL> select sum(measure), count(*) from (
  2  select * from t_fetch_first_rows
  3  where type = 'VIEW'
  4  order by id
  5  );

Elapsed: 00:00:02.98

Execution Plan
----------------------------------------------------------
Plan hash value: 2125410158

-----------------------------------------------------------------------------------------
| Id  | Operation          | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                    |     1 |    11 |  2159   (5)| 00:00:26 |
|   1 |  SORT AGGREGATE    |                    |     1 |    11 |            |          |
|*  2 |   TABLE ACCESS FULL| T_FETCH_FIRST_ROWS |  1503K|    15M|  2159   (5)| 00:00:26 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TYPE"='VIEW')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       7488  consistent gets
       7479  physical reads
          0  redo size
        483  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 
SQL> set autotrace off
SQL> 
SQL> -- exec dbms_monitor.session_trace_disable
SQL> 

A simple full table scan took only three seconds, and processed 7,488 blocks, representing simply the net size of the table. What a vast difference compared to the 12GB processed via the index.

Note that when I repeat the statement using the index access without clearing the buffer cache its actual run time will drop in my test environment from one minute to less than 10 seconds, but that's not really good news, since it's usually quite unrealistic to have all blocks in the buffer cache.

To underline that this inefficient statement represents a real scaling inhibitor, I'll use Tom Kyte's RUNSTATS package to demonstrate the different resource consumption of the two statements:

SQL> exec runstats_pkg.rs_start

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL> 
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.03
SQL> 
SQL> -- Show the different resource consumption
SQL> select sum(measure), count(*) from (
  2  select /*+ index(t_fetch_first_rows) */ * from t_fetch_first_rows
  3  where type = 'VIEW'
  4  order by id
  5  );

SUM(MEASURE)   COUNT(*)
------------ ----------
  6876814594    1499497

Elapsed: 00:01:04.14
SQL> 
SQL> exec runstats_pkg.rs_middle

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.06
SQL> 
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.04
SQL> 
SQL> select sum(measure), count(*) from (
  2  select * from t_fetch_first_rows
  3  where type = 'VIEW'
  4  order by id
  5  );

SUM(MEASURE)   COUNT(*)
------------ ----------
  6876814594    1499497

Elapsed: 00:00:02.43
SQL> 
SQL> -- Show only values different by 150 least
SQL> exec runstats_pkg.rs_stop(150)
Run1 ran in 6417 hsecs
Run2 ran in 250 hsecs
run 1 ran in 2566,8% of the time
 
Name                                  Run1        Run2        Diff
STAT...redo size                     2,580       2,856         276
LATCH.channel operations paren         310          22        -288
LATCH.checkpoint queue latch           414          27        -387
LATCH.messages                         442          41        -401
STAT...bytes sent via SQL*Net        1,960       1,501        -459
LATCH.JS queue state obj latch         468           0        -468
STAT...CPU used by this sessio         836          61        -775
STAT...CPU used when call star         836          61        -775
LATCH.enqueues                       1,047           8      -1,039
LATCH.enqueue hash chains            1,059           8      -1,051
LATCH.SQL memory manager worka       1,480          73      -1,407
STAT...physical reads               11,240       7,486      -3,754
STAT...physical reads cache         11,240       7,486      -3,754
STAT...free buffer requested        11,245       7,491      -3,754
LATCH.cache buffers lru chain       11,329       7,540      -3,789
STAT...user I/O wait time            5,612         196      -5,416
STAT...DB time                       6,418         254      -6,164
STAT...Elapsed Time                  6,422         255      -6,167
STAT...physical reads cache pr           0       7,403       7,403
STAT...table scan blocks gotte           0       7,478       7,478
STAT...physical read IO reques      11,240          83     -11,157
STAT...physical read total IO       11,240          83     -11,157
LATCH.simulator lru latch           93,302         932     -92,370
LATCH.simulator hash latch          94,001       1,398     -92,603
STAT...session uga memory max      130,928           0    -130,928
STAT...session pga memory         -213,636           0     213,636
STAT...Cached Commit SCN refer   1,499,291       7,477  -1,491,814
STAT...consistent gets from ca   1,503,273       7,504  -1,495,769
STAT...consistent gets           1,503,273       7,504  -1,495,769
STAT...no work - consistent re   1,503,252       7,478  -1,495,774
STAT...session logical reads     1,503,310       7,533  -1,495,777
STAT...buffer is pinned count    1,499,496           0  -1,499,496
STAT...table fetch by rowid      1,499,501           3  -1,499,498
STAT...buffer is not pinned co   1,499,506           6  -1,499,500
STAT...table scan rows gotten            0   1,500,000   1,500,000
LATCH.cache buffers chains       3,019,149      23,679  -2,995,470
STAT...physical read bytes      92,078,080  61,325,312 -30,752,768
STAT...physical read total byt  92,078,080  61,325,312 -30,752,768
 
Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
3,243,130      53,136  -3,189,994  6,103.45%

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.07
SQL> 

The most obvious threat can be seen from the LATCH statistics: Latches are Oracle internal serialization devices that are used to protect resources from being modified by multiple processes at the same time. Latches therefore serialize access to certain memory structures that e.g. are used to maintain the buffers in the buffer cache. The more latch activity, the more CPU is used and the less scalable is a system, since the concurrent processes will have to wait for each other at the serialization points.

So the inefficient statement reported more than 3,000,000 latches, whereas the simple table scan required only approx. 50,000. So even if you not too concerned with the different run times in this single user test case, running such statements simultaneously in a real multi-user system will significantly influence the performance of the whole system.

Saturday, January 24, 2009

Correlation, nocorrelation and extended stats - amendments

Riyaj Shamsudeen recently published a very interesting blog post about the 11g extended statistics feature and correlated column values.

Because I think that some of his findings were significantly influenced by the fact that he generated frequency histograms on all columns, I've repeated his test case with some more variations, in particular with and without histograms and in case of histograms, with and without the new feature "Bug 5483301: Cardinality of 1 when predicate value non-existent in frequency histogram" which was introduced in 10.2.0.4 and 11.1.0.6. More details about this change can be found in Metalink Doc ID 5483301.8.

The following results were obtained using 11.1.0.7 Enterprise Edition on Windows 32bit.

I'll start with his case 1:

create table y1 (a number, b number,  c number);

begin
  for i in 1..1000 loop
    for j in 1..10 loop
      insert into y1 values (j,mod(j,5), mod(j,2) );
    end loop;
  end loop;
end;
/

commit;

REM Distribution of these column values given below.
select a, b, count(*) from y1 group by a,b order by a,b
/
         A          B   COUNT(*)
---------- ---------- ----------
         1          1       1000
         2          2       1000
         3          3       1000
         4          4       1000
         5          0       1000
         6          1       1000
         7          2       1000
         8          3       1000
         9          4       1000
        10          0       1000

10 rows selected.

REM Let's also add an index to this table
create index y1_i1 on y1 (a, b);

REM The number of distinct keys is used
REM to determine selectivity if
REM a) an all-equal operation on the entire index is used
REM and obviously
REM b) NO histogram is present
select distinct_keys from user_indexes where index_name = 'Y1_I1';

DISTINCT_KEYS
-------------
           10

begin
   dbms_stats.gather_table_stats (
   ownname =>null,
   tabname=>'y1',
   estimate_percent=>null,
   cascade=>true,
   method_opt =>'for all columns size 254');
end;
/

REM this shows that although the index is in place
REM the DISTINCT_KEYS information is ignored
REM but the histogram is used instead
REM and we fall back to the default selectivity formula
REM see at the end of the script for a demonstration
REM without index which comes to the same result
alter session set tracefile_identifier = 'correlated1';

alter session set events '10053 trace name context forever, level 1';

explain plan for select c from y1 where a=1 and b=1;

alter session set events '10053 trace name context off';

The corresponding 10053 trace file excerpt shows:

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for Y1[Y1] 
  Column (#1): 
    NewDensity:0.050000, OldDensity:0.000050 BktCnt:10000, PopBktCnt:10000, PopValCnt:10, NDV:10
  Column (#2): 
    NewDensity:0.100000, OldDensity:0.000050 BktCnt:10000, PopBktCnt:10000, PopValCnt:5, NDV:5
  ColGroup (#1, Index) Y1_I1
    Col#: 1 2    CorStregth: 5.00
  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial: 
  Table: Y1  Alias: Y1
    Card: Original: 10000.000000  Rounded: 200  Computed: 200.00  Non Adjusted: 200.00
  Access Path: TableScan
    Cost:  12.96  Resp: 12.96  Degree: 0
      Cost_io: 12.00  Cost_cpu: 2396429
      Resp_io: 12.00  Resp_cpu: 2396429
  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial: 
  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial: 
  Access Path: index (AllEqRange)
    Index: Y1_I1
    resc_io: 5.00  resc_cpu: 114847
    ix_sel: 0.020000  ix_sel_with_filters: 0.020000 
    Cost: 5.05  Resp: 5.05  Degree: 1
  Best:: AccessPath: IndexRange
  Index: Y1_I1
         Cost: 5.05  Degree: 1  Resp: 5.05  Card: 200.00  Bytes: 0

In contrast to Riyah's interpretation I would say that it shows that the index information is not used to derive the selectivity ("Matches Full:"). Rather it looks like the presence of the frequency histogram causes a fall back to the default formula used for non-correlated columns: selectivity(A) * selectivity(B) which leads to the selectivity of 0.02 and hence cardinality of 200.

Now repeating the same without any histograms:

begin
   dbms_stats.delete_table_stats (
   ownname =>null,
   tabname=>'y1',
   cascade_indexes=>true);
end;
/

begin
   dbms_stats.gather_table_stats (
   ownname =>null,
   tabname=>'y1',
   estimate_percent=>null,
   cascade=>true,
   method_opt =>'for all columns size 1');
end;
/

REM without histograms
REM the all-equal on entire index rule is used
REM hence the selectivity is 0.1 => 1/DISTINCT_KEYS of index
alter session set tracefile_identifier = 'correlated2';

alter session set events '10053 trace name context forever, level 1';

explain plan for select c from y1 where a=1 and b=1;

alter session set events '10053 trace name context off';