Wednesday, January 28, 2009

Common tuning issues - avoiding the "fetch first rows quickly" trap

It seems to be a commonly encountered issue that as part of the analysis of a potentially poor performing SQL query statement the investigation is mislead by the fact that running the query in a GUI database front-end tool like TOAD, SQLDeveloper or similar shows an instant reply.

Most of these tools by default fetch only the first n rows of the result set conveniently into a grid-like GUI component.

This might lead to the conclusion that the statement seems to perform well and therefore doesn't require further investigations respectively tuning efforts.

The potential problem with this approach is that due to several possible causes a statement might return the first rows very quickly, but depending on how the statement is actually used this doesn't necessarily mean that all rows will be processed in a efficient manner.

Usually the following circumstances can lead to such side-effects:

- Grossly underestimated cardinalities by the optimizer
- An efficient access path available to the optimizer based on that wrong cardinality estimate, usually an index

There are several scenarios that can be the root cause of the underestimated cardinalities, among them are:

- Wrong statistics of involved tables or indexes
- Sharing of child cursors (execution plans) when using bind variables that cause different cardinalities, can also be forced indirectly by CURSOR_SHARING=FORCE
- Incorrect estimation of join cardinalities: There are simply scenarios where the optimizer is not able to come up with a good estimate of a join, even (or possibly because of) with the help of histograms

Although starting from 10g on there is a default statistics collection job which should help to prevent the first point on the list from happening, it's actually this default job in conjunction with a particular pattern of database usage that can lead exactly to this result.

The following test case run on Oracle 10g XE should simulate this pattern to demonstrate one of the possible scenarios.

It creates table which represents a "staging" table of a bulk load process. This "staging" table is accidentally empty during the night when the default statistics collection job is executed. Afterwards the table gets loaded with 1,500,000 rows and a query is executed without refreshing the statistics after the load is complete.

You'll see that based on the wrong statistics the optimizer comes to a wrong conclusion.

The interesting side effect of this is that the query performs very well in a front-end tool that fetches only the first n rows, since the plan accidentally corresponds to a very efficient "first_rows(n)" plan, although the optimizer_mode is ALL_ROWS by default in 10g.

I've carefully crafted this example to show that even the deliberate usage of an "ORDER BY" in an attempt to force the database to process the whole result set for sorting before returning the first rows can be misleading, since it's possible to process such a query without a sort operation in case a suitable index exists.

Only by definitely forcing the database to process all rows you get the real picture if you know that all rows from your statement actually need to be processed. But it's not that easy to force this; the cost based optimizer is smart and most of the time simply wrapping the statement in a "COUNT(*) from (...)" is again misleading, because the optimizer often is able to avoid the random row access to the table, which is where most of the time is usually spent under such circumstances.

Let's start with setting up the environment:


SQL>
SQL> show parameter optimizer_mode

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_mode string ALL_ROWS
SQL>
SQL> drop table t_fetch_first_rows purge;

Table dropped.

Elapsed: 00:00:00.84
SQL>
SQL> create table t_fetch_first_rows (
2 id number not null,
3 name varchar2(30) not null,
4 type varchar2(30) not null,
5 measure number
6 );

Table created.

Elapsed: 00:00:00.09
SQL>
SQL> create index idx_fetch_first_rows on t_fetch_first_rows (type, id);

Index created.

Elapsed: 00:00:00.03
SQL>
SQL> -- Simulate an "empty" staging table analyzed by the nightly default
SQL> -- statistics collection job
SQL> exec dbms_stats.gather_table_stats(null, 't_fetch_first_rows')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.29
SQL>
SQL> select num_rows, blocks
2 from user_tab_statistics
3 where table_name = 'T_FETCH_FIRST_ROWS';

NUM_ROWS BLOCKS
---------- ----------
0 0

Elapsed: 00:00:00.01
SQL>
SQL> select blevel, leaf_blocks, distinct_keys, num_rows, clustering_factor
2 from user_ind_statistics
3 where index_name = 'IDX_FETCH_FIRST_ROWS';

BLEVEL LEAF_BLOCKS DISTINCT_KEYS NUM_ROWS CLUSTERING_FACTOR
---------- ----------- ------------- ---------- -----------------
0 0 0 0 0

Elapsed: 00:00:00.03
SQL>
SQL> -- Load the data, the object type is deliberately skewed
SQL> -- Most of the values are "VIEW"
SQL> insert /*+ append */ into t_fetch_first_rows (
2 id,
3 name,
4 type,
5 measure)
6 select object_id, object_name, object_type, object_id as measure
7 from all_objects, (select level as id from dual connect by level <= 1000) dup
8 where object_type in ('VIEW', 'SCHEDULE')
9 and rownum <= 1500000;

1500000 rows created.

Elapsed: 00:00:39.36
SQL>
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL>
SQL> column segment_name format a20
SQL>
SQL> select segment_name, bytes, blocks, extents
2 from user_segments
3 where segment_name = 'T_FETCH_FIRST_ROWS'
4 or segment_name = 'IDX_FETCH_FIRST_ROWS';

SEGMENT_NAME BYTES BLOCKS EXTENTS
-------------------- ---------- ---------- ----------
T_FETCH_FIRST_ROWS 62914560 7680 75
IDX_FETCH_FIRST_ROWS 31457280 3840 45

Elapsed: 00:00:00.31
SQL>
SQL> select type, count(*) from t_fetch_first_rows
2 group by type
3 order by 2 desc;

TYPE COUNT(*)
------------------------------ ----------
VIEW 1499497
SCHEDULE 503

Elapsed: 00:00:01.37
SQL>


The "empty" staging table has been analyzed and afterwards 1,500,000 rows have been loaded which represent approx. 60MB of data, supported by a 30MB index. The table consists of 7,680 blocks (8KB default block size), the index has 3,840 blocks.

Note that the data is skewed, most of entries have a object type of "VIEW", so filtering for this value will return almost all of the rows from the table.


SQL> explain plan for
2 select * from t_fetch_first_rows
3 where type = 'VIEW'
4 order by id;

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3754482025

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 60 | 0 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T_FETCH_FIRST_ROWS | 1 | 60 | 0 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_FETCH_FIRST_ROWS | 1 | | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TYPE"='VIEW')
filter("TYPE"='VIEW')

15 rows selected.

Elapsed: 00:00:00.03
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:18.95
SQL>
SQL> set autotrace traceonly
SQL>
SQL> -- Force to fetch all rows
SQL> -- Note that COUNT(*) itself is not sufficient
SQL> -- because the optimizer is smart and doesn't visit the table
SQL> -- in this case which means that the test case would be no longer
SQL> -- representative
SQL> select count(*) from (
2 select * from t_fetch_first_rows
3 where type = 'VIEW'
4 order by id
5 );

Elapsed: 00:00:05.92

Execution Plan
----------------------------------------------------------
Plan hash value: 3912594726

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 0 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
|* 2 | INDEX RANGE SCAN| IDX_FETCH_FIRST_ROWS | 1 | 17 | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TYPE"='VIEW')


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3758 consistent gets
3758 physical reads
0 redo size
414 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> set autotrace off
SQL>


Running above query including a filter predicate and an ORDER BY clause will return almost immediately, suggesting good response time.

And by simply wrapping the query in a "SELECT COUNT(*) FROM (...)" block it seems the result confirms that the query performs good: Only 3,758 blocks read (by the way, if you check the execution plan, it's an index only execution and not visiting the table at all - the blocks read basically represent all blocks from the index) in five seconds. So obviously there is nothing wrong...


SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.01
SQL> -- alter session set tracefile_identifier = 'fetch_first_rows1';
SQL>
SQL> -- exec dbms_monitor.session_trace_enable
SQL>
SQL> set autotrace traceonly
SQL>
SQL> -- Force to fetch all rows
SQL> -- Note that COUNT(*) itself is not sufficient
SQL> -- because the optimizer is smart and doesn't visit the table
SQL> -- in this case which means that the test case would be no longer
SQL> -- representative
SQL> select sum(measure), count(*) from (
2 select * from t_fetch_first_rows
3 where type = 'VIEW'
4 order by id
5 );

Elapsed: 00:01:04.31

Execution Plan
----------------------------------------------------------
Plan hash value: 1903859112

-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 30 | 0 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 30 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T_FETCH_FIRST_ROWS | 1 | 30 | 0 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IDX_FETCH_FIRST_ROWS | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TYPE"='VIEW')


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1503256 consistent gets
11236 physical reads
72 redo size
483 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> set autotrace off
SQL>
SQL> -- exec dbms_monitor.session_trace_disable
SQL>


But look what happens if I force to process all rows in a realistic manner: It takes roughly a minute and more significantly approx. 1,500,000 consistent block gets to process a table that consists of 7,680 blocks ! That's a logical I/O amount representing roughly 12GB.

And if you check the AUTOTRACE statistics you'll notice that I was actually lucky: I only read 11,236 blocks physically, the vast majority of the blocks came from the buffer cache after all blocks of the table have been read into it. In a real production system where the segments might be much larger and potentially multiple of such statements might compete for the buffer cache simultaneously a lot more buffer gets might lead to real physical I/O making things even worse.

The impressive amount of logical I/O is caused by the fact that using this execution plan the database scanned the index leaf blocks for the rows according to the filter criteria and accessed the corresponding table in a random fashion.

But the smallest entity that the database knows in terms of I/O is a block, so in order to read a single random row the whole block where the row resides in needs to be read. Due to the bad clustering factor of the index, which means that the table rows are differently ordered than the index, for each row randomly accessed a different block of the table had to be read. So on average each block of the table was accessed almost 200 times during the process!

If the table and the index were ordered in the same way (represented by a low clustering factor of the index close to number of blocks of the table, whereas a bad clustering factor is close to the number of rows of the table) then the amount of logical I/O required could be significantly less, since the rows accessed randomly reside in the same block and therefore the same table block can be re-used to process the row.

Now I'm going to show that having representative index statistics is as important as having the tables properly analyzed:


SQL> -- Relevance of index stats: Gather proper table statistics but without indexes
SQL> exec dbms_stats.gather_table_stats(null, 't_fetch_first_rows', cascade => false, no_invalidate=>false)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.40
SQL>
SQL> select num_rows, blocks
2 from user_tab_statistics
3 where table_name = 'T_FETCH_FIRST_ROWS';

NUM_ROWS BLOCKS
---------- ----------
1479394 7606

Elapsed: 00:00:00.03
SQL>
SQL> select blevel, leaf_blocks, distinct_keys, num_rows, clustering_factor
2 from user_ind_statistics
3 where index_name = 'IDX_FETCH_FIRST_ROWS';

BLEVEL LEAF_BLOCKS DISTINCT_KEYS NUM_ROWS CLUSTERING_FACTOR
---------- ----------- ------------- ---------- -----------------
0 0 0 0 0

Elapsed: 00:00:00.04
SQL>
SQL> explain plan for
2 select * from t_fetch_first_rows
3 where type = 'VIEW'
4 order by id;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3754482025

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1478K| 42M| 0 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T_FETCH_FIRST_ROWS | 1478K| 42M| 0 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_FETCH_FIRST_ROWS | 1478K| | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TYPE"='VIEW')
filter("TYPE"='VIEW')

15 rows selected.

Elapsed: 00:00:00.03
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.18
SQL>
SQL> -- alter session set tracefile_identifier = 'fetch_first_rows2';
SQL>
SQL> -- exec dbms_monitor.session_trace_enable
SQL>
SQL> set autotrace traceonly
SQL>
SQL> -- Force to fetch all rows
SQL> -- Note that COUNT(*) itself is not sufficient
SQL> -- because the optimizer is smart and doesn't visit the table
SQL> -- in this case which means that the test case would be no longer
SQL> -- representative
SQL> select sum(measure), count(*) from (
2 select * from t_fetch_first_rows
3 where type = 'VIEW'
4 order by id
5 );

Elapsed: 00:01:03.90

Execution Plan
----------------------------------------------------------
Plan hash value: 1903859112

-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 0 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T_FETCH_FIRST_ROWS | 1478K| 15M| 0 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IDX_FETCH_FIRST_ROWS | 1478K| | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TYPE"='VIEW')


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1503255 consistent gets
11236 physical reads
0 redo size
483 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> set autotrace off
SQL>
SQL> -- exec dbms_monitor.session_trace_disable
SQL>


Although I got now proper table statistics, the optimizer is still fooled by the incorrect index statistics and again chooses the inefficient index access.

Gathering proper statistics on both table and index leads to a better decision of the optimizer:


SQL> -- Now gather proper statistics
SQL> exec dbms_stats.gather_table_stats(null, 't_fetch_first_rows', cascade => true, no_invalidate=>false)

PL/SQL procedure successfully completed.

Elapsed: 00:00:13.40
SQL>
SQL> select num_rows, blocks
2 from user_tab_statistics
3 where table_name = 'T_FETCH_FIRST_ROWS';

NUM_ROWS BLOCKS
---------- ----------
1504130 7606

Elapsed: 00:00:00.01
SQL>
SQL> select blevel, leaf_blocks, distinct_keys, num_rows, clustering_factor
2 from user_ind_statistics
3 where index_name = 'IDX_FETCH_FIRST_ROWS';

BLEVEL LEAF_BLOCKS DISTINCT_KEYS NUM_ROWS CLUSTERING_FACTOR
---------- ----------- ------------- ---------- -----------------
2 3757 2984 1500000 1500000

Elapsed: 00:00:00.03
SQL>
SQL> explain plan for
2 select * from t_fetch_first_rows
3 where type = 'VIEW'
4 order by id;

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2749563054

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1503K| 43M| | 14713 (4)| 00:02:57 |
| 1 | SORT ORDER BY | | 1503K| 43M| 127M| 14713 (4)| 00:02:57 |
|* 2 | TABLE ACCESS FULL| T_FETCH_FIRST_ROWS | 1503K| 43M| | 2159 (5)| 00:00:26 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("TYPE"='VIEW')

14 rows selected.

Elapsed: 00:00:00.01
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.26
SQL>
SQL> -- alter session set tracefile_identifier = 'fetch_first_rows3';
SQL>
SQL> -- exec dbms_monitor.session_trace_enable
SQL>
SQL> set autotrace traceonly
SQL>
SQL> -- Force to fetch all rows
SQL> -- Note that COUNT(*) itself is not sufficient
SQL> -- because the optimizer is smart and doesn't visit the table
SQL> -- in this case which means that the test case would be no longer
SQL> -- representative
SQL> select sum(measure), count(*) from (
2 select * from t_fetch_first_rows
3 where type = 'VIEW'
4 order by id
5 );

Elapsed: 00:00:02.98

Execution Plan
----------------------------------------------------------
Plan hash value: 2125410158

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 2159 (5)| 00:00:26 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| T_FETCH_FIRST_ROWS | 1503K| 15M| 2159 (5)| 00:00:26 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("TYPE"='VIEW')


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
7488 consistent gets
7479 physical reads
0 redo size
483 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> set autotrace off
SQL>
SQL> -- exec dbms_monitor.session_trace_disable
SQL>


A simple full table scan took only three seconds, and processed 7,488 blocks, representing simply the net size of the table. What a vast difference compared to the 12GB processed via the index.

Note that when I repeat the statement using the index access without clearing the buffer cache its actual run time will drop in my test environment from one minute to less than 10 seconds, but that's not really good news, since it's usually quite unrealistic to have all blocks in the buffer cache.

To underline that this inefficient statement represents a real scaling inhibitor, I'll use Tom Kyte's RUNSTATS package to demonstrate the different resource consumption of the two statements:


SQL> exec runstats_pkg.rs_start

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.03
SQL>
SQL> -- Show the different resource consumption
SQL> select sum(measure), count(*) from (
2 select /*+ index(t_fetch_first_rows) */ * from t_fetch_first_rows
3 where type = 'VIEW'
4 order by id
5 );

SUM(MEASURE) COUNT(*)
------------ ----------
6876814594 1499497

Elapsed: 00:01:04.14
SQL>
SQL> exec runstats_pkg.rs_middle

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.06
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.04
SQL>
SQL> select sum(measure), count(*) from (
2 select * from t_fetch_first_rows
3 where type = 'VIEW'
4 order by id
5 );

SUM(MEASURE) COUNT(*)
------------ ----------
6876814594 1499497

Elapsed: 00:00:02.43
SQL>
SQL> -- Show only values different by 150 least
SQL> exec runstats_pkg.rs_stop(150)
Run1 ran in 6417 hsecs
Run2 ran in 250 hsecs
run 1 ran in 2566,8% of the time

Name Run1 Run2 Diff
STAT...redo size 2,580 2,856 276
LATCH.channel operations paren 310 22 -288
LATCH.checkpoint queue latch 414 27 -387
LATCH.messages 442 41 -401
STAT...bytes sent via SQL*Net 1,960 1,501 -459
LATCH.JS queue state obj latch 468 0 -468
STAT...CPU used by this sessio 836 61 -775
STAT...CPU used when call star 836 61 -775
LATCH.enqueues 1,047 8 -1,039
LATCH.enqueue hash chains 1,059 8 -1,051
LATCH.SQL memory manager worka 1,480 73 -1,407
STAT...physical reads 11,240 7,486 -3,754
STAT...physical reads cache 11,240 7,486 -3,754
STAT...free buffer requested 11,245 7,491 -3,754
LATCH.cache buffers lru chain 11,329 7,540 -3,789
STAT...user I/O wait time 5,612 196 -5,416
STAT...DB time 6,418 254 -6,164
STAT...Elapsed Time 6,422 255 -6,167
STAT...physical reads cache pr 0 7,403 7,403
STAT...table scan blocks gotte 0 7,478 7,478
STAT...physical read IO reques 11,240 83 -11,157
STAT...physical read total IO 11,240 83 -11,157
LATCH.simulator lru latch 93,302 932 -92,370
LATCH.simulator hash latch 94,001 1,398 -92,603
STAT...session uga memory max 130,928 0 -130,928
STAT...session pga memory -213,636 0 213,636
STAT...Cached Commit SCN refer 1,499,291 7,477 -1,491,814
STAT...consistent gets from ca 1,503,273 7,504 -1,495,769
STAT...consistent gets 1,503,273 7,504 -1,495,769
STAT...no work - consistent re 1,503,252 7,478 -1,495,774
STAT...session logical reads 1,503,310 7,533 -1,495,777
STAT...buffer is pinned count 1,499,496 0 -1,499,496
STAT...table fetch by rowid 1,499,501 3 -1,499,498
STAT...buffer is not pinned co 1,499,506 6 -1,499,500
STAT...table scan rows gotten 0 1,500,000 1,500,000
LATCH.cache buffers chains 3,019,149 23,679 -2,995,470
STAT...physical read bytes 92,078,080 61,325,312 -30,752,768
STAT...physical read total byt 92,078,080 61,325,312 -30,752,768

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
3,243,130 53,136 -3,189,994 6,103.45%

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.07
SQL>


The most obvious threat can be seen from the LATCH statistics: Latches are Oracle internal serialization devices that are used to protect resources from being modified by multiple processes at the same time. Latches therefore serialize access to certain memory structures that e.g. are used to maintain the buffers in the buffer cache. The more latch activity, the more CPU is used and the less scalable is a system, since the concurrent processes will have to wait for each other at the serialization points.

So the inefficient statement reported more than 3,000,000 latches, whereas the simple table scan required only approx. 50,000. So even if you not too concerned with the different run times in this single user test case, running such statements simultaneously in a real multi-user system will significantly influence the performance of the whole system.

5 comments:

Anonymous said...

Hi Randolf,

My oltp application is working the best with optimizer mode : FIRST_ROWS_100 .

I followed your example (only changed the name of the table and the index ) but found that the CBO is still choosing INDEX as access path and not TABLE ACCESS FULL .
Bellow all the steps i took.
How come that only when i used ALL_ROWS hint it choose TABLE ACCESS FULL :

=========
STEP 1:
=========
I create table that contain 1.5 million records , as follow :

insert /*+ append */ into YOAV_BULK_LOAD (id, name, type, measure)
select object_id, object_name, object_type, object_id as measure
from all_objects, (select level as id from dual connect by level <= 1000) dup
where object_type in ('VIEW', 'SCHEDULE')
and rownum <= 1500000;

1500000 rows created.

SQL> commit;

I created index on this table :
=========
STEP 2:
=========
create index yoav_idx on yoav_bulk_load (type, id);

select type, count(*)
from YOAV_BULK_LOAD
group by type
order by 2 desc;

TYPE COUNT(*)
------------------- ------------
VIEW 1,499,497
SCHEDULE 503


=========
STEP 3:
=========

-- Drop statistics from the table:
SQL> exec dbms_stats.delete_table_stats(ownname =>'BEZEQ', tabname => 'yoav_bulk_load');

PL/SQL procedure successfully completed.

=========
STEP 4:
=========

-- Gather table stats using default method_opt
SQL> exec dbms_stats.gather_table_stats(ownname =>'BEZEQ', tabname => 'yoav_bulk_load' , method_opt => 'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.


-- Check statistics on the table
SQL> select table_name,num_rows,last_analyzed
2 from user_tab_statistics
3 where table_name = 'YOAV_BULK_LOAD';

TABLE_NAME NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
YOAV_BULK_LOAD 1483038 04-MAR-09

1 row selected.

-- Check index statistics
SQL> select blevel, leaf_blocks, distinct_keys, num_rows, clustering_factor
2 from user_ind_statistics
3 where index_name = 'YOAV_IDX';

BLEVEL LEAF_BLOCKS DISTINCT_KEYS NUM_ROWS CLUSTERING_FACTOR
---------- ----------- ------------- ---------- -----------------
2 8485 3802 1528259 975581

1 row selected.

=========
STEP 5:
=========

===> Oracle is choosing INDEX RANGE SCAN INSTED OF FTS

SQL> explain plan for
2 select * from YOAV_BULK_LOAD
3 where type = 'VIEW'
4 order by id;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 102 | 3366 | 70 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| YOAV_BULK_LOAD | 102 | 3366 | 70 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | YOAV_IDX | | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------

=========
STEP 6:
=========

-- So , i gather histograms on this table :

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname =>'BEZEQ', tabname => 'yoav_bulk_load', method_opt => 'FOR COLUMNS TYPE SIZE 254');

PL/SQL procedure successfully completed.

SQL> select * from dba_tab_histograms where table_name = 'YOAV_BULK_LOAD';


OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
------------------------------ ------------------------------ -------------------- --------------- -------------- --------------------
BEZEQ YOAV_BULK_LOAD TYPE 1 4.3233E+35
BEZEQ YOAV_BULK_LOAD TYPE 5601 4.4802E+35
BEZEQ YOAV_BULK_LOAD ID 0 159
BEZEQ YOAV_BULK_LOAD NAME 0 3.3905E+35
BEZEQ YOAV_BULK_LOAD MEASURE 0 159
BEZEQ YOAV_BULK_LOAD ID 1 10976745
BEZEQ YOAV_BULK_LOAD NAME 1 4.9565E+35
BEZEQ YOAV_BULK_LOAD MEASURE 1 10976745

8 rows selected.

SQL> explain plan for
2 select * from YOAV_BULK_LOAD
3 where type = 'VIEW'
4 order by id;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 101 | 3333 | 68 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| YOAV_BULK_LOAD | 101 | 3333 | 68 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | YOAV_IDX | | | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

=====================================================================
STEP 7: ONLY AFTER USING ALL_ROWS HINT ORACLE IS CHOOCING FTS
=====================================================================

SQL> explain plan for
2 select /*+ all_rows */ * from YOAV_BULK_LOAD
3 where type = 'VIEW'
4 order by id;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1527K| 48M| | 27727 (6)| 00:05:05 |
| 1 | SORT ORDER BY | | 1527K| 48M| 142M| 27727 (6)| 00:05:05 |
|* 2 | TABLE ACCESS FULL| YOAV_BULK_LOAD | 1527K| 48M| | 3675 (10)| 00:00:41 |
---------------------------------------------------------------------------------------------

Randolf said...

Are you sure that your application works best with FIRST_ROWS_100?

If yes, do you understand why?

In your example you tell the optimizer that you're going to fetch 100 rows at most, so therefore it's legitimate/expected that the optimizer uses the index, no matter how many rows are in the table/satisfying the filter predicate, because using the index will be the cheapest way to fetch the first 100 records.

If you fetch all rows but tell the optimizer that you're only fetching the first 100, you can expect that things are not going to perform well.

You need to understand what the FIRST_ROWS_n mode implies.

If your application doesn't behave like only fetching the first rows from a result set, the FIRST_ROWS_n mode is very likely not going to be best for your application.

If you have different performance issues using the ALL_ROWS mode you need to find out what the issue is about in detail, but in general if your application uses to fetch all rows from a result set, ALL_ROWS is the mode to use.

If you have further questions I suggest you post a request on the worldwide OTN forum so that other people can comment, too.

Regards,
Randolf

Randolf said...

Jonathan Lewis has a nice wrap-up summarizing the general thoughts about the different modes:

first_rows_n:http://jonathanlewis.wordpress.com/2008/11/11/first_rows_n/

Anonymous said...

Hi ,

exec dbms_stats.gather_table_stats(null, 't_fetch_first_rows', cascade => true, no_invalidate=>false);

Only after changing no_invalidate
from true to false , the plan changed. why ?

Thanks Again

Randolf said...

> Only after changing no_invalidate
from true to false , the plan changed. why ?

That doesn't explain your previous issue described, since you were running EXPLAIN PLANs which should perform a hard parse every time.

But if you have a statement cached in the shared pool the default behaviour of 10g is to invalidate the statements affected by the updated statistics within a certain period, which is by default 18,000 secs (5 hours), controlled by the internal parameter "_optimizer_invalidation_period".

This can lead to the situation that you've updated the statistics but affected statements don't get reparsed immediately, but in a deferred fashion.

The rationale behind this default behaviour is to prevent high system load due to potentially excessive hard parsing.

If you want to have all affected statement invalidated immediately, you need the option you've described.

But again, this is a different issue: This is about the actual execution of the statement.

Your previous post was referring to EXPLAIN PLAN outputs.

So I'm not sure what you're referring to now.

Regards,
Randolf