Showing posts with label System Statistics. Show all posts
Showing posts with label System Statistics. Show all posts

Wednesday, July 27, 2011

Cost Is Time: Next Generation

It looks like Oracle has introduced with the Oracle 11.2.0.2 patch set a new "cost is time" model for the time estimate of the Cost-Based Optimizer (CBO).

In order to understand the implications let me summarize the evolution of the CBO in terms of cost / time estimate so far:

1. Oracle 7 and 8

The cost estimate generated by the Cost-Based Optimizer (CBO) has always been a time estimate, although expressed in a slightly obscure unit, which is number of single block reads.

The traditional I/O based costing introduced with Oracle 7 in principle counted the number of required single and multi-block reads to arrive at the final cost. A potential drawback of this approach was the missing differentiation of multi- and single-block reads - one multi-block read created the same cost as a one single-block read. Although the model used an "adjusted" multi-block read count to make full table scans more costlier than indicated by larger "db_file_multiblock_read_count" settings (and hence accounted for smaller extents and blocks already cached in the buffer cache making multi-block reads smaller than requested) it still potentially favoured full table scans over index access paths.

From Oracle 8 on one could play with the OPTIMIZER_INDEX_COST_ADJ / OPTIMIZER_INDEX_CACHING parameter to adjust this shortcoming of the costing model in particular for OLTP biased applications.

2. Oracle 9i

Oracle introduced with Oracle 9i the System Statistics along with a more sophisticated cost calculation model.

In short, System Statistics offer the following features:

- Different treatment of single-block and multi-block operations
- Time-based optimization using average timings for single- and multi-block reads
- Cost calculation includes a CPU cost component
- Gather actual hardware capabilities to base the calculations on actual system capabilities and workload pattern

More details can be found for example in my "Understanding System Statistics" blog series.

So with System Statistics the CBO actually calculates an estimated execution time - you can see this in the EXPLAIN PLAN output: With System Statistics enabled it includes a TIME column.

Simplified spoken the time estimate is simply the average time for a single block read (SREADTIM) times the number of single block reads plus the average time for a multi-block read (MREADTIM) times the number of multi-block reads plus the estimated number of cpu operations divided by the cpu operations per second (CPUSPEED / CPUSPEEDNW). So the cost with System Statistics is actually based on a time estimation.

For consistency reasons it has been decided to use the same unit as before, so the estimated time is simply divided by the SREADTIM to arrive at the same cost unit as with traditional I/O based costing which is number of single-block reads (although plans involving full segment scan operations usually arrive at different costs than the traditional costing, so consistency is hardly given anyway).

Right from the beginning in Oracle 9i the System Statistics could be gathered in WORKLOAD mode, which means that Oracle takes two snapshots of certain performance statistics and calculates the System Statistics parameters like SREADTIM, MREADTIM, MBRC etc. from the delta values.

3. Oracle 10g

Starting with Oracle 10g System Statistics were enabled by default with so called default NOWORKLOAD settings. It even allowed to generate an artificial load on the system by gathering the NOWORKLOAD System Statistics simply using a predefined I/O pattern to gather the disk transfer speed (IOTFRSPEED) and disk seek time (IOSEEKTIM) - these values are then used to derive the SREADTIM and MREADTIM values - the two most important ingredients of the enhanced cost/time calculation.

So since Oracle 9i there is a built-in functionality to measure the capabilities of the underlying hardware - from 10g on either based on a particular workload pattern or by submitting an artificial predefined load.

Furthermore Oracle provides a well-defined API as part of the DBMS_STATS package for dealing with System Statistics: They can be gathered, deleted, exported, imported, manually defined and even gathered directly into a separate statistics table to build a history of System Statistics gathered.

4. Oracle 11g

In 11g Oracle introduced the I/O calibration routine as part of the Resource Manager. Note that so far this feature didn't have a direct relationship to the Cost Based Optimizer - it could be used however to control the maximum parallel degree using the PARALLEL_IO_CAP_ENABLED parameter.

The first thing that puzzled me when dealing with that new functionality was the lack of a well-defined API to maintain the gathered information. There is a single call in the resource manager package (DBMS_RESOURCE_MANAGER.CALIBRATE_IO) to run the I/O calibration, but apart from that no additional functionality for maintenance. No way to delete the calibration results, export or import them, or even manually override.

If you want to get an understanding what this means, have a look at the MOS document "Automatic Degree of Parallelism in 11.2.0.2 [ID 1269321.1]" that, besides stating that there can be problems with the actual I/O calibration like gathering unreasonable values or not running to completion, shows you how to manipulate an internal SYS table to override the values gathered which also requires to bounce the instance in order to become effective.

I find it hard to understand why Oracle hasn't address these handling shortcomings in the meantime, particularly given the fact that with Oracle 11.2.0.2 the I/O resource calibration becomes mandatory if you want to make use of the new Auto-DOP feature that has been introduced with 11.2.0.1. Fiddling with a SYS-owned table doesn't sound like a well-designed feature to me, and the calibration functionality is not exactly "brand-new".

5. Oracle 11.2.0.2

So starting with 11.2.0.2 the new "cost is time" calculation comes into the picture. If you have values in the corresponding SYS.RESOURCE_IO_CALIBRATE$ table (that is simply externalized by the DBA_RSRC_IO_CALIBRATE view) then something really odd happens:

The cost that has been calculated according to the System Statistics model - which is already a time estimate based on three different components - time for single-block and multi-block reads as well as the estimated CPU time is now converted into a data volume by simply multiplying the resulting cost with the default block size. Dividing this data volume by the throughput as indicated by the I/O calibration results (it looks like the value MAX_PMBPS is relevant) arrives at a new estimated execution time.

Let's have a look at a working example: With default NOWORKLOAD System Statistics, 8KB default block size and a unset db_file_multiblock_read_count that results in a MultiBlockReadCount (MBRC) of 8 to be used internally for calculation of a full table scan (FTS) the time estimate for a FTS of 10,000 blocks (80MB) will be based on 1,250 multi-block reads, which are estimated to take 26ms each - this gives us a time estimate of 32.5 seconds. The CPU time associated with that full table scan operation will be added on top so that the final result will be something between 32.5 and 33 seconds. Let's stick to the 32.5 seconds - this time estimate corresponds to approx. 2,710 single-block reads by simply dividing the time by 12ms which happens to be the SREADTIM value for default NOWORKLOAD System Statistics with above configuration - this value will be close to the cost shown (minor variations are depending on the CPU speed determined).

Cost / time estimate for a FTS of a 10,000 block segment with 8KB block size, default NOWORKLOAD System Statistics and default MBRC of 8 used for cost calculation (_db_file_optimizer_read_count = 8):

-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2716 (1)| 00:00:33 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| T | 10000 | 2716 (1)| 00:00:33 |
-------------------------------------------------------------------


Now if you happen to have a value of MAX_PMBPS of 4MB/sec as I/O Resource Calibration result (I chose this very conservative value deliberately because it happens to be the same transfer rate that the default NOWORKLOAD System Statistics assumes (4096 bytes per millisec), the following new time calculation will happen instead:

2,710 will be multiplied with the 8KB default block size to arrive at a data volume, in this case approx. 21 MB

This approx. 21 MB is now divided by the 4MB/sec, to arrive at a new time estimate of approx. 5.3 seconds, rounded up to 6 seconds. Note that the original time estimate was 32.5 seconds.

Cost / time estimate for a FTS of a 10,000 block segment with 8KB block size, default NOWORKLOAD System Statistics and default MBRC of 8 used for cost calculation (_db_file_optimizer_read_count = 8) but MAX_PMBPS set to 4MB/sec:

-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2716 (1)| 00:00:06 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| T | 10000 | 2716 (1)| 00:00:06 |
-------------------------------------------------------------------


You can see this also happening in the 10053 CBO trace file:


kkeCostToTime: using io calibrate stats
maxmbps=0(MB/s) maxpmbps=4(MB/s)
block_size=8192 mb_io_count=1 mb_io_size=8192 (bytes)
tot_io_size=21(MB) time=5304(ms)


Now this approach strikes me as odd for several reasons:

- A sophisticated time estimate (remember that it even includes a CPU time component that has nothing to do with an I/O volume) is turned into data volume to arrive at a new time estimate using a rather simplistic approach

- As you can see from above example the "data volume" calculated does not correspond to the actual I/O volume that we know from the System Statistics cost/time calculation - remember that the actual segment size in this case was 80MB, not 20MB. This is of course caused by the underlying calculation of the original time estimate based on multi-block reads. So why we would turn the cost/time into some data volume that has nothing to do with the actual data volume used for the original cost/time calculation is above me

- There is already an I/O calibration routine available as part of the System Statistics functionality that can be used to arrive at more realistic time estimates based on the gathered System Statistics information - why a second one has been introduced? Furthermore this raises the question: If I'm required to run the I/O calibration to enable Auto-DOP - shouldn't I then also "calibrate" my System Statistics to arrive at a "calibrated" cost estimate? After all the new "Cost Is Time" approach uses the cost estimate for the new time estimate.

- As already outlined there is no officially documented way to properly deal with the I/O calibration results - manually poking into SYS-owned tables doesn't really count

Implications

So you probably think, why bother? The cost estimate is left untouched, only the TIME column is affected. So execution plans shouldn't change since they are still chosen based on the lowest cost estimate - and the lower the cost, the lower the new time estimate.

You'll appreciate however that the previous simple correlation between the cost and the time estimate is no longer true with 11.2.0.2 and resource calibration results available: So far you could simply divide the time estimate by the SREADTIM value to arrive at the cost, or the other way around, you could multiply the cost by the SREADTIM value to arrive at the time estimate - or use both values to arrive at the SREADTIM value - since the time divided by the cost should give you the approximate value of SREADTIM.

The point with 11.2.0.2 and the I/O resource calibration is that the new time estimate is obviously used for the Auto-DOP feature to drive two crucial decisions:

- Is the statement a candidate for parallel execution? This is controlled via the parameter PARALLEL_MIN_TIME_THRESHOLD that defaults to 10 seconds in 11.2.0.2

- If it is a candidate for parallel execution what is the optimal DOP? This is of course depending on a lot of different inputs but also seems to be based on the new time estimate - that, as just explained, arrives at a (wrong) data volume estimate in a questionable way

As a side note, Oracle at present recommends to set the value of MAX_PMBPS to 200 for Exadata environments rather than relying on the results of the actual I/O calibration - another indication that the I/O calibration results as of now are questionable.

Summary

Oracle introduced with the 11.2.0.2 patch set a new model for the estimated "Execution Time" if I/O resource calibration results are available. As outlined above the new approach seems to be questionable (at least), but will be used for crucial decisions regarding the new Auto-DOP feature. It will be interesting to see the further development in this area, whether for example the new time algorithm will be changed in upcoming releases or the influence of the I/O calibration on the CBO calculations will be extended.

If you want to make use of the new Auto-DOP feature in 11.2.0.2 you should be aware of these relationship - the MAX_PMBPS parameter drives the new time estimation and the Auto-DOP calculations.

Sunday, May 24, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 4

Back to part 3

Using objects residing in multiple blocksizes

I've already mentioned it several times on my blog but I would like to take the chance here again to stress the point that the cost based optimizer does a bad job when it comes to calculating costs for full table scans of objects residing in non-default block sizes. It really looks like that this feature has been introduced to support transportable tablespaces but it obviously hasn't been tested very thoroughly when it comes to cost calculation.

Each of the different modes has its deficiencies when dealing with objects in non-default blocksizes. The somehow odd thing is that the traditional I/O costing does the best job, and all system statistics based calculations are utterly wrong.

Traditional I/O based costing

The traditional I/O based costing simply scales the MBRC up or down according to the non-default blocksize to come to the same I/O read request size. So if you e.g. have a MBRC of 8 and a default blocksize of 8KB and now calculate the cost for an object residing in a 2KB tablespace, the MBRC will be multiplied 4, which results in a MBRC of 32. The I/O cost will be different although due to the different adjustment used with the higher MBRC setting. The adjusted MBRC for 32 is 16.39 whereas the adjusted MBRC for 8 is 6.59, so the calculated cost for the full table scan of the object residing in the 2KB tablespace will be higher. Likewise the same happens when using an object in a 16KB non-default tablespace. The MBRC will be reduced accordingly to 4 to get the same I/O read size again. Since adjusted MBRC for MBRC = 4 is 4.17, the cost calculated will actually be less for the object residing the 16KB tablespace.

System statistics

Things get more complicated when using NOWORKLOAD or WORKLOAD system statistics.
To recall the formula to calculate the I/O cost of a full table scan with system statistics is:

Number of blocks / MBRC * MREADTIM / SREADTIM

And in case of NOWORKLOAD system statistics the MREADTIM and SREADTIM are synthesized using the following formula:

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTRFSPEED

MREADTIM = IOSEEKTIM + DB_BLOCK_SIZE * MBRC / IOTRFSPEED

Now if the object resides in a non-default blocksize tablespace, the following inconsistent adjustments are applied to the formula:

NOWORKLOAD system statistics:

SREADTIM as above, using _DEFAULT_ DB_BLOCK_SIZE

MREADTIM = IOSEEKTIM + _DEFAULT_ DB_BLOCK_SIZE * scaled MBRC / IOTRFSPEED

I/O cost = Number of blocks / scaled MBRC * MREADTIM / SREADTIM

So obviously something is odd in above formulas: The SREADTIM and MREADTIM values are synthesized with a mixture of a scaled MBRC (according to the block size) but a non-adjusted default DB_BLOCK_SIZE, resulting in a large variation in cost, so a full table scan in a small blocksize is much more expensive than in the default blocksize, and likewise a full table scan in a large blocksize is much cheaper. Unfortunately this doesn't reflect at all the runtime behaviour, since Oracle acutally scales the I/O read request size accordingly meaning that the runtime difference usually is negligible, but the cost calculated is dramatically different.

WORKLOAD system statistics:

MBRC as measured/set

SREADTIM as measured/set

MREADTIM as measured/set

I/O cost = Number of blocks / MBRC * MREADTIM / SREADTIM

This is even worse than above NOWORKLOAD result because the I/O cost calculated simply is different by the factor of number of blocks in non-default block size / number of blocks in default block size, e.g. an object residing in a 2KB block size will have an I/O cost four times higher than an object residing in a 8KB default blocksize, and the MBRC is not adjusted at all for the calculation.

Some calculation examples

The following examples attempt to summarize what you get if you're using an object in a non-default blocksize tablespace:

1. I/O Cost of a full table scan of a 10,000 blocks / 80MB segment residing in a default blocksize of 8KB (in case of system statistics excluding the CPU cost) and using an MBRC of 8:

traditional I/O:
round(10,000 blocks / 6.59) = 1,517

(default) NOWORKLOAD system statistics:
SREADTIM = 10 + 8192/4096 = 10 + 2 = 12
MREADTIM = 10 + 8192*8/4096 = 10 + 16 = 26

round((10,000 / 8) * 26 / 12) = 1,250 * 26 / 12 = 2,708

WORKLOAD system statistics:
same as above provided you use the same for MREADTIM and SREADTIM otherwise as measured

2. I/O Cost of a full table scan of a 40,000 blocks / 80MB segment residing in a non-default blocksize of 2KB (in case of system statistics excluding the CPU cost) and using a default blocksize of 8KB and a MBRC of 8:

traditional I/O:
MBRC used for 2KB blocksize = 8 * 8KB / 2KB = 8 * 4 = 32

adjusted MBRC for MBRC = 32: 16.39

round(40,000 blocks / 16.39) = 2,441

(default) NOWORKLOAD system statistics:
SREADTIM = 10 + 8192/4096 = 10 + 2 = 12
(should be 10 + 2048/4096 = 10 + 0.5 = 10.5)

MREADTIM = 10 + 8192*32/4096 = 10 + 64 = 74
(should be 10 + 2048*32/4096 = 10 + 16 = 26)

round((40,000 / 32) * 74 / 12) = 1,250 * 74 / 12 = 7,708
(should be round((40,000 / 32) * 26 / 10.5) = 1,250 * 26 / 10.5 = 3,095

WORKLOAD system statistics (for the sake of comparison using the NOWORKLOAD defaults):
MBRC = 8
SREADTIM = 12
MREADTIM = 26

round((40,000 / 8) * 26 / 12) = 5000 * 26 / 12 = 10,833
(should be round((40,000 / 32) * 26 / 12) = 1,250 * 26 / 12 = 2,708

3. I/O Cost of a full table scan of a 5,000 blocks / 80MB segment residing in a non-default blocksize of 16KB (in case of system statistics excluding the CPU cost) and using a default blocksize of 8KB and a MBRC of 8:

traditional I/O:
MBRC used for 16KB blocksize = 8 * 8KB / 16KB = 8 * 0.5 = 4

adjusted MBRC for MBRC = 4: 4.17

round(5,000 blocks / 4.17) = 1,199

(default) NOWORKLOAD system statistics:
SREADTIM = 10 + 8192/4096 = 10 + 2 = 12
(should be 10 + 16384/4096 = 10 + 4 = 14)

MREADTIM = 10 + 8192*4/4096 = 10 + 8 = 18
(should be 10 + 16384*4/4096 = 10 + 16 = 26)

round((5,000 / 4) * 18 / 12) = 1,250 * 18 / 12 = 1,875
(should be round((5,000 / 4) * 26 / 14) = 1,250 * 26 / 14 = 2,321

WORKLOAD system statistics:
MBRC = 8
SREADTIM = 12
MREADTIM = 26

round((5,000 / 8) * 26 / 12) = 625 * 26 / 12 = 1,354
(should be (5,000 / 4) * 26 / 12) = 1,250 * 26 / 12 = 2,708

Plan change by moving to non-default block size

And here is a final example that shows by simply moving an object to a non-default blocksize tablespace a change in the execution plan can be observed. An index scan is used instead of the full table scan since the cost of the full table scan operation is significantly increased. The SQL trace will reveal that at runtime however there is no significant difference between the full table scan executed in the default tablespace and non-standard blocksize tablespace.


SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> create table t1
2 as
3 select
4 trunc(dbms_random.value(0,25)) as col1
5 , trunc(dbms_random.value(0,25)) as col2
6 , rpad('x', 200, 'x') as filler
7 from
8 (
9 select
10 level as id
11 from
12 dual
13 connect by
14 level <= 10000
15 );

Table created.

Elapsed: 00:00:00.49
SQL>
SQL> create index t1_idx on t1 (col1, col2)
2 ;

Index created.

Elapsed: 00:00:00.14
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.55
SQL>
SQL> explain plan for
2 select /*+ full(t1) */
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 83 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 96 | 19776 | 83 (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COL1"=1 AND ("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR
"COL2"=4 OR "COL2"=5 OR "COL2"=6))

14 rows selected.

Elapsed: 00:00:00.13
SQL>
SQL> explain plan for
2 select /*+ index(t1) */
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 96 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 96 | 19776 | 96 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | T1_IDX | 96 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("COL1"=1)
filter("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR "COL2"=4 OR "COL2"=5 OR
"COL2"=6)

16 rows selected.

Elapsed: 00:00:00.11
SQL>
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 83 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 96 | 19776 | 83 (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COL1"=1 AND ("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR
"COL2"=4 OR "COL2"=5 OR "COL2"=6))

14 rows selected.

Elapsed: 00:00:00.05
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.02
SQL>
SQL> alter session set tracefile_identifier = 'fts_8k';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set events '10046 trace name context forever, level 8';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set termout off
SQL>
SQL> select /*+ full(t1) */
2 *
3 from
4 t1
5 where
6 col1 = 1
7 and col2 in (1,2,3,4,5,6);

SQL>
SQL> set termout on
SQL>
SQL> alter session set events '10046 trace name context off';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set tracefile_identifier = 'idx_8k';

Session altered.

Elapsed: 00:00:00.02
SQL>
SQL> alter session set events '10046 trace name context forever, level 8';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set termout off
SQL>
SQL> select /*+ index(t1) */
2 *
3 from
4 t1
5 where
6 col1 = 1
7 and col2 in (1,2,3,4,5,6);

SQL>
SQL> set termout on
SQL>
SQL> alter session set events '10046 trace name context off';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.70
SQL>
SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> create table t1
2 tablespace test_2k
3 as
4 select
5 trunc(dbms_random.value(0,25)) as col1
6 , trunc(dbms_random.value(0,25)) as col2
7 , rpad('x', 200, 'x') as filler
8 from
9 (
10 select
11 level as id
12 from
13 dual
14 connect by
15 level <= 10000
16 );

Table created.

Elapsed: 00:00:00.46
SQL>
SQL> create index t1_idx on t1 (col1, col2)
2 tablespace test_2k;

Index created.

Elapsed: 00:00:00.13
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.69
SQL>
SQL> explain plan for
2 select /*+ full(t1) */
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.07
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 245 (1)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| T1 | 96 | 19776 | 245 (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COL1"=1 AND ("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR
"COL2"=4 OR "COL2"=5 OR "COL2"=6))

14 rows selected.

Elapsed: 00:00:00.11
SQL>
SQL> explain plan for
2 select /*+ index(t1) */
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.06
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 102 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 96 | 19776 | 102 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | T1_IDX | 96 | | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("COL1"=1)
filter("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR "COL2"=4 OR "COL2"=5 OR
"COL2"=6)

16 rows selected.

Elapsed: 00:00:00.10
SQL>
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 102 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 96 | 19776 | 102 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | T1_IDX | 96 | | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("COL1"=1)
filter("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR "COL2"=4 OR "COL2"=5 OR
"COL2"=6)

16 rows selected.

Elapsed: 00:00:00.10
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.03
SQL>
SQL> alter session set tracefile_identifier = 'fts_2k';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set events '10046 trace name context forever, level 8';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> set termout off
SQL>
SQL> select /*+ full(t1) */
2 *
3 from
4 t1
5 where
6 col1 = 1
7 and col2 in (1,2,3,4,5,6);

SQL> set termout on
SQL>
SQL> alter session set events '10046 trace name context off';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set tracefile_identifier = 'idx_2k';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set events '10046 trace name context forever, level 8';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set termout off
SQL>
SQL> select /*+ index(t1) */
2 *
3 from
4 t1
5 where
6 col1 = 1
7 and col2 in (1,2,3,4,5,6);

SQL> set termout on
SQL>
SQL> alter session set events '10046 trace name context off';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> spool off


And here's what we get from the generated trace files.

The 8KB FTS does these multi-block reads:

WAIT #35: nam='db file scattered read' ela= 2296 file#=8 block#=12050 blocks=8 obj#=61857 tim=107988983808
WAIT #35: nam='db file scattered read' ela= 916 file#=8 block#=12058 blocks=8 obj#=61857 tim=107988984970
WAIT #35: nam='db file scattered read' ela= 911 file#=8 block#=12066 blocks=8 obj#=61857 tim=107988986104


And the 2KB FTS does these multi-block reads:

WAIT #37: nam='db file scattered read' ela= 798 file#=6 block#=66 blocks=32 obj#=61859 tim=107991673671
WAIT #37: nam='db file scattered read' ela= 828 file#=6 block#=98 blocks=32 obj#=61859 tim=107991674822
WAIT #37: nam='db file scattered read' ela= 829 file#=6 block#=130 blocks=32 obj#=61859 tim=107991675980


And the TKPROF output confirms that there is not a relevant difference in the wait times.

The 8KB output:

********************************************************************************

select /*+ full(t1) */
*
from
t1
where
col1 = 1
and col2 in (1,2,3,4,5,6)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.06 296 299 0 93
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.06 296 299 0 93

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61

Rows Row Source Operation
------- ---------------------------------------------------
93 TABLE ACCESS FULL T1 (cr=299 pr=296 pw=0 time=21098 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
db file sequential read 1 0.00 0.00
db file scattered read 37 0.02 0.05
SQL*Net message from client 2 0.00 0.00
********************************************************************************


The 2KB output:


********************************************************************************

select /*+ full(t1) */
*
from
t1
where
col1 = 1
and col2 in (1,2,3,4,5,6)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.02 0.07 1251 1254 0 93
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.03 0.07 1251 1254 0 93

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61

Rows Row Source Operation
------- ---------------------------------------------------
93 TABLE ACCESS FULL T1 (cr=1254 pr=1251 pw=0 time=21314 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
db file sequential read 1 0.00 0.00
db file scattered read 40 0.02 0.05
SQL*Net message from client 2 0.00 0.00
********************************************************************************


Don't be mislead by the different number of consistent gets, this is just due to the different block size. As can be seen from the raw trace files the multi-block reads in the 2KB block size are reading 32 blocks at a time, whereas in the 8KB default block size 8 blocks are read at a time, so the number of I/O requests will be similar given a segment of the same size.

Sunday, May 17, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 3

Back to part 2 Forward to part 4

System statistics in 9i

In 10g the CPU costing mode is enabled by default and is supported by the default NOWORKLOAD system statistics.

But you can use system statistics already in 9i, although you have to enable them explicitly.

Oracle 9i is out of support, but I guess there are still a lot of systems out there that are using the 9.2 release, therefore I find it worth to mention what you can do in 9i with system statistics. This can be helpful if you consider to test your application already in the 9i environment with system statistics before upgrading to 10g.

In most descriptions about 9i and system statistics only WORKLOAD system statistics are mentioned, but 9i also supports NOWORKLOAD system statistics, although not in the same full flavour as 10g does.

You can activate CPU costing in 9i by running DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD'), but this seems to work differently than in 10g and later.

Whereas in 10g and later this actually measures the IOSEEKTIM and IOTFRSPEED values, this seems to activate in 9i something that is comparable with the default NOWORKLOAD system statistics of 10g.

The SYS.AUX_STATS$ does not show any actual values, but tests revealed that 9i (at least 9.2.0.8) in that case seems to measure the CPU speed (in 10g this is the CPUSPEEDNW value) and uses the same default values for IOSEEKTIM and IOTFRSPEED as 10g does (10ms and 4096 bytes/ms resp.).

Running some tests showed that you arrive at the same I/O cost as you do in 10g with the default NOWORKLOAD system statistics.


exec dbms_stats.delete_system_stats;

exec dbms_stats.gather_system_stats('NOWORKLOAD')

alter session set "_table_scan_cost_plus_one" = false;

explain plan for
select
max(val)
from
t1;


Since you can't modify the NOWORKLOAD system statistics values in 9i (i.e. you can't manipulate CPUSPEEDNW, IOSEEKTIM and IOTFRSPEED using DBMS_STATS.SET_SYSTEM_STATS), see the following excerpt from the 9.2.0.8 10053 trace file for the 10,000 blocks MSSM table with 8kb blocksize when using NOWORKLOAD system statistics:


Join order[1]: T1[T1]#0
Best so far: TABLE#: 0 CST: 2715 CDN: 10000 BYTES: 40000
Final - All Rows Plan:
JOIN ORDER: 1
CST: 2715 CDN: 10000 RSC: 2714 RSP: 2714 BYTES: 40000
IO-RSC: 2709 IO-RSP: 2709 CPU-RSC: 72914400 CPU-RSP: 72914400


The I/O cost calculated seems to correspond to what 10g calculates when using default NOWORKLOAD system statistics.

Other variants also seem to correspond to the 10g default NOWORKLOAD model. E.g. using a DB_FILE_MULTIBLOCK_READ_COUNT = 16 we get the following cost calculation:


Join order[1]: T1[T1]#0
Best so far: TABLE#: 0 CST: 2194 CDN: 10000 BYTES: 40000
Final - All Rows Plan:
JOIN ORDER: 1
CST: 2194 CDN: 10000 RSC: 2193 RSP: 2193 BYTES: 40000
IO-RSC: 2188 IO-RSP: 2188 CPU-RSC: 72914400 CPU-RSP: 72914400


If you gather or set WORKLOAD system statistics 9i calculates the same cost as 10g does, based on the SREADTIM, MREADTIM and MBRC values.

You can manipulate the WORKLOAD system statistics using DBMS_STATS.SET_SYSTEM_STATS like you can do in 10g. So the following code snippet mimics the default NOWORKLOAD SREADTIM and MREADTIM values for a 8kb blocksize with a DB_FILE_MULTIBLOCK_READ_COUNT = 8.


drop table t1;

create table t1
pctfree 99
pctused 1
-- tablespace test_2k
-- tablespace test_4k
tablespace test_8k
-- tablespace test_16k
as
with generator as (
select --+ materialize
rownum id
from all_objects
where rownum <= 3000
)
select
/*+ ordered use_nl(v2) */
rownum id,
trunc(100 * dbms_random.normal) val,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 10000
;


begin
dbms_stats.gather_table_stats(
user,
't1',
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1'
);
end;
/

begin
dbms_stats.set_system_stats('MBRC',8);
dbms_stats.set_system_stats('MREADTIM',26.0);
dbms_stats.set_system_stats('SREADTIM',12.0);
dbms_stats.set_system_stats('CPUSPEED',1000000);
end;
/

alter session set "_table_scan_cost_plus_one" = false;

alter session set tracefile_identifier = '9i_workload_system_stats';

alter session set events '10053 trace name context forever, level 1';

explain plan for
select max(val)
from t1;

alter session set events '10053 trace name context off';


Which should arrive at the same cost as the default NOWORKLOAD system statistics. And indeed it gives you this 10053 trace file excerpt:


Join order[1]: T1[T1]#0
Best so far: TABLE#: 0 CST: 2710 CDN: 10000 BYTES: 40000
Final - All Rows Plan:
JOIN ORDER: 1
CST: 2710 CDN: 10000 RSC: 2709 RSP: 2709 BYTES: 40000
IO-RSC: 2709 IO-RSP: 2709 CPU-RSC: 72914400 CPU-RSP: 72914400


If the WORKLOAD system statistics are invalid (e.g. the MREADTIM is not greater than the SREADTIM value or some values are missing) then 9i falls back to traditional I/O cost based costing for the I/O cost but adds obviously CPU costs, which is different from 10g which falls back to NOWORKLOAD system statistics.

If you change in the above code snippet the corresponding code block like this:


begin
dbms_stats.set_system_stats('MBRC',8);
dbms_stats.set_system_stats('MREADTIM',12.0);
dbms_stats.set_system_stats('SREADTIM',12.0);
dbms_stats.set_system_stats('CPUSPEED',1000000);
end;
/


You'll see the following output from the 10053 trace file:


Join order[1]: T1[T1]#0
Best so far: TABLE#: 0 CST: 1519 CDN: 10000 BYTES: 40000
Final - All Rows Plan:
JOIN ORDER: 1
CST: 1519 CDN: 10000 RSC: 1518 RSP: 1518 BYTES: 40000
IO-RSC: 1518 IO-RSP: 1518 CPU-RSC: 72914400 CPU-RSP: 72914400


This an interesting mixture of traditional I/O costing and CPU costing.

Quirks and oddities

While running my tests I came across several quirks and oddities:

- When gathering NOWORKLOAD system statistics with 10.2.0.4 and 11.1.0.7 on my test environment it happened regularly that the IOTFRSPEED was left unchanged at the default of 4096, but sometimes it also changed to a value that looked more like a measured one.

- You obviously can't set the CPUSPEEDNW/IOSEEKTIM/IOTFRSPEED values using DBMS_STATS.SET_SYSTEM_STATS in 9i for the NOWORKLOAD system statistics, so you're bound to the default values used by Oracle.

- If you're on 10g and later and use OPTIMIZER_FEATURES_ENABLE = '9.2.0.8' the NOWORKLOAD system statistics are not used properly. Although the 10053 optimizer trace suggests otherwise by mentioning the NOWORKLOAD values the final cost calculated falls back to traditional I/O based costing (and no CPU costing at all). This is probably caused by the fact that 9i doesn't support the NOWORKLOAD parameters in the SYS.AUX_STATS$ table that are always present in 10g. System statistics work however in 9.2.0.8 compatibility mode if you're using WORKLOAD system statistics with their corresponding values in SYS.AUX_STATS$.

- Altering the SYSTEM value of db_file_multiblock_read_count: This seems to be buggy because the value set is not getting used in the session that altered the system. You need to set other optimizer related values in the same session to make the change of the parameter effective. For more information, see Dion Cho's blog post about the issue.

The next part of the series will cover my "favorite" topic using objects in multiple block sizes and effects on costing, in particular when using system statistics.

Sunday, May 3, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 2

Back to part 1 Forward to part 3

Before heading on to the remaining modes of system statistics, let's summarize what has been observed in part 1 regarding the default NOWORKLOAD system statistics in 10g and later. The following table shows what the test case from the previous post demonstrated:

Table 1: 8KB MSSM locally managed tablespace 10,000 blocks table segment
default NOWORKLOAD system statistics:


MBRC|SREADTIM|MREADTIM|MREADTIM/SREADTIM|NOCPU cost|adjusted MBRC|CPU cost|CPU/NOCPU cost
----|--------|--------|-----------------|----------|-------------|--------|--------------
8 |12 | 26 | 2.16 |1,518 | 6.59 |2,709 |1.78
16 |12 | 42 | 3.5 | 962 |10.39 |2,188 |2.27
32 |12 | 74 | 6.16 | 610 |16.39 |1,928 |3.16
64 |12 |138 |11.5 | 387 |25.84 |1,798 |4.64
128 |12 |266 |22.16 | 245 |40.82 |1,732 |7.07


If you happen to have a 16KB default blocksize the results would look like the following. Note that the table is now only 5,000 blocks in size, and the SREADTIM is now a bit longer (10+16384/4096=14ms instead of 10+8192/4096=12ms) therefore the 16KB blocksize calculation makes the full table scan look a bit cheaper to the optimizer when using the default NOWORKLOAD system statistics.

Table 2: 16KB MSSM locally managed tablespace 5,000 blocks table segment
default NOWORKLOAD system statistics:


MBRC|SREADTIM|MREADTIM|MREADTIM/SREADTIM|NOCPU cost|adjusted MBRC|CPU cost|CPU/NOCPU cost
----|--------|--------|-----------------|----------|-------------|--------|--------------
4 |14 | 26 | 1.86 |1,119 | 4.17 |2,322 |2.08
8 |14 | 42 | 3.0 | 759 | 6.59 |1,875 |2.47
16 |14 | 74 | 5.3 | 481 |10.39 |1,652 |3.43
32 |14 |138 | 9.86 | 305 |16.39 |1,540 |5.05
64 |14 |266 |19.0 | 194 |25.84 |1,485 |7.65


Gathered NOWORKLOAD system statistics

If you gather NOWORKLOAD system statistics using DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD') the values IOSEEKTIM and IOTFRSPEED will actually get measured and used accordingly.

The remaining calculations how to derive the SREADTIM and MREADTIM values correspond to what has been outlined above.

Gathering NOWORKLOAD statistics:

exec DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD')

This actually gathers the IOTFRSPEED and IOSEEKTIM values in addition to CPUSPEEDNW rather than using the default values of 4096 and 10.

In 10g and later this may take from a couple of seconds to a couple of minutes depending on the size of your database. Note that this puts additional load onto your system while gathering the NOWORKLOAD system statistics since it submits random reads against all data files.

The following test case shows the different I/O cost calculations when using default NOWORKLOAD system statistics and custom gathered NOWORKLOAD system statistics. It creates again the 10,000 blocks table in a 8KB default block size locally managed tablespace using manual segment space management:


SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.03
SQL>
SQL> create table t1
2 pctfree 99
3 pctused 1
4 -- tablespace test_2k
5 -- tablespace test_4k
6 tablespace test_8k
7 -- tablespace test_16k
8 as
9 with generator as (
10 select --+ materialize
11 rownum id
12 from all_objects
13 where rownum <= 3000
14 )
15 select
16 /*+ ordered use_nl(v2) */
17 rownum id,
18 trunc(100 * dbms_random.normal) val,
19 rpad('x',100) padding
20 from
21 generator v1,
22 generator v2
23 where
24 rownum <= 10000
25 ;

Table created.

Elapsed: 00:00:02.22
SQL>
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.29
SQL> -- default NOWORKLOAD system statistics
SQL> -- ignore CPU costs for the moment
SQL> begin
2 dbms_stats.delete_system_stats;
3 dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
4 end;
5 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> column sname format a20
SQL> column pname format a20
SQL> column pval2 format a20
SQL>
SQL> select
2 sname
3 , pname
4 , pval1
5 , pval2
6 from
7 sys.aux_stats$;

SNAME PNAME PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_INFO DSTART 04-26-2009 14:21
SYSSTATS_INFO DSTOP 04-26-2009 14:21
SYSSTATS_INFO FLAGS 1
SYSSTATS_MAIN CPUSPEEDNW 1000000
SYSSTATS_MAIN IOSEEKTIM 10
SYSSTATS_MAIN IOTFRSPEED 4096
SYSSTATS_MAIN SREADTIM
SYSSTATS_MAIN MREADTIM
SYSSTATS_MAIN CPUSPEED
SYSSTATS_MAIN MBRC
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN SLAVETHR

13 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> alter session set "_table_scan_cost_plus_one" = false;

Session altered.

Elapsed: 00:00:00.02
SQL>
SQL> explain plan for
2 select
3 max(val)
4 from
5 t1;

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2709 (0)| 00:00:33 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2709 (0)| 00:00:33 |
---------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.08
SQL> -- gather NOWORKLOAD system statistics
SQL> exec dbms_stats.gather_system_stats('NOWORKLOAD')

PL/SQL procedure successfully completed.

Elapsed: 00:00:14.43
SQL> -- ignore CPU costs for the moment
SQL> begin
2 dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
3 end;
4 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.02
SQL> column sname format a20
SQL> column pname format a20
SQL> column pval2 format a20
SQL>
SQL> select
2 sname
3 , pname
4 , pval1
5 , pval2
6 from
7 sys.aux_stats$;

SNAME PNAME PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_INFO DSTART 04-26-2009 14:21
SYSSTATS_INFO DSTOP 04-26-2009 14:21
SYSSTATS_INFO FLAGS 1
SYSSTATS_MAIN CPUSPEEDNW 1000000
SYSSTATS_MAIN IOSEEKTIM 14.226
SYSSTATS_MAIN IOTFRSPEED 32517.754
SYSSTATS_MAIN SREADTIM
SYSSTATS_MAIN MREADTIM
SYSSTATS_MAIN CPUSPEED
SYSSTATS_MAIN MBRC
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN SLAVETHR

13 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> explain plan for
2 select
3 max(val)
4 from
5 t1;

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1403 (0)| 00:00:21 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1403 (0)| 00:00:21 |
---------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.04
SQL>
SQL> spool off


Based on the gathered IOSEEKTIM and IOTFRSPEED values the I/O cost calculated is significantly different.

Applying the known formulas we can reproduce the calculated figures:

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED

SREADTIM = 14.226 + 8192 / 32,517.754 = 14.478

MREADTIM = IOSEEKTIM + MBRC * DB_BLOCK_SIZE / IOTFRSPEED

MREADTIM = 14.226 + 8 * 8192 / 32,517.754 = 16.241

FTS cost = Blocks below HWM / MBRC * MREADTIM / SREADTIM

FTS cost = 10,000 / 8 * 16.241 / 14.478 = 1,403

Gathered WORKLOAD system statistics

Gathering WORKLOAD system statistics measures a different set of values, including the actual MBRC, SREADTIM and MREADTIM values. The cost calculation therefore doesn't use the synthesized SREADTIM and MREADTIM values any longer, nor does it use the "_db_file_optimizer_read_count" parameter in 10g and later, but uses simply the measured values.

Therefore the I/O costs calculated with WORKLOAD system statistics are not dependent on the "db_file_multiblock_read_count" value used, but the important point to keep in mind is that the gathered WORKLOAD system statistics are based on the "db_file_multiblock_read_count" (in 10g and later on the internal parameter "_db_file_exec_read_count") value used at runtime, so the values measured are obviously influenced by this setting ("_db_file_exec_read_count" equals "db_file_multiblock_read_count" if this has been set and the underscore parameter hasn't been modified).

As already mentioned in part 1 Oracle has introduced with Oracle 10.2 that if the "db_file_multiblock_read_count" is left unset different values for cost calculation and at execution time will be used (8 for calculation and the largest possible I/O size at runtime, usually 1MB on most platforms), so that points in general into the right direction since it allows the calibration code to work out the largest MBRC possible at runtime that can be achieved. Note that Christian Antognini doesn't agree to this approach in his book "Troubleshooting Oracle Performance" where he advices to manually work out the "optimal" MBRC setting running suitable I/O tests.

Note that in 10g and later the runtime engine still uses the "_db_file_exec_read_count", regardless of the MBRC used to calculate the cost.

If you run the following code snippet in 10g and later and check the resulting trace files, you'll see this confirmed:


alter session set tracefile_identifier = 'exec_count_16';

alter session set "_db_file_exec_read_count" = 16;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 8';

select max(val)
from t1;

alter session set events '10046 trace name context off';

alter session set tracefile_identifier = 'exec_count_128';

alter session set "_db_file_exec_read_count" = 128;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 8';

select max(val)
from t1;

alter session set events '10046 trace name context off';


The resulting trace files look like the following:

The 16 blocks setting:


.
.
.
WAIT #2: nam='db file scattered read' ela= 1732 file#=8 block#=12058 blocks=16 obj#=62088 tim=69006657688
WAIT #2: nam='db file scattered read' ela= 1725 file#=8 block#=12074 blocks=16 obj#=62088 tim=69006659628
WAIT #2: nam='db file scattered read' ela= 1726 file#=8 block#=12090 blocks=16 obj#=62088 tim=69006661566
.
.
.


The 128 blocks setting:


.
.
.
WAIT #2: nam='db file scattered read' ela= 13842 file#=8 block#=12169 blocks=128 obj#=62088 tim=69008775308
WAIT #2: nam='db file scattered read' ela= 15513 file#=8 block#=12297 blocks=128 obj#=62088 tim=69008793460
WAIT #2: nam='db file scattered read' ela= 26437 file#=8 block#=12425 blocks=128 obj#=62088 tim=69008822434
.
.
.


Gathering WORKLOAD system statistics:

exec DBMS_STATS.GATHER_SYSTEM_STATS('START')
-- some significant (ideally "representative") workload needs to be performed
-- otherwise some or all of the measured values will be missing
exec DBMS_STATS.GATHER_SYSTEM_STATS('STOP')

or

exec DBMS_STATS.GATHER_SYSTEM_STATS('INTERVAL', )

Note that gathering workload system statistics doesn't put additional load onto your system, since the values are derived from the delta in statistics already maintained by Oracle during database activity.

Furthermore if your workload doesn't use "db file scattered read" i.e. multi-block reads that are working with the buffer cache, then you might end up with WORKLOAD system statistics that are missing the MBRC and MREADTIM component (null values). This might happen if you e.g. use only index access paths with table row random lookups by ROWID and/or all your tablescans are all going parallel, or in 11g use serial direct reads that bypass the buffer cache (which can be activated in pre-11g using the hidden parameter "_serial_direct_read").

The same applies to "db file sequential read" i.e. single-block reads, if you only perform multi-block reads in your workload then the SREADTIM information might be missing from the gathered statistics.

Although the official documentation says in 10.2 for that case the following:
"During the gathering process of workload statistics, it is possible that mbrc and mreadtim will not be gathered if no table scans are performed during serial workloads, as is often the case with OLTP systems. On the other hand, FTS occur frequently on DSS systems but may run parallel and bypass the buffer cache. In such cases, sreadtim will still be gathered since index lookup are performed using the buffer cache. If Oracle cannot gather or validate gathered mbrc or mreadtim, but has gathered sreadtim and cpuspeed, then only sreadtim and cpuspeed will be used for costing. FTS cost will be computed using analytical algorithm implemented in previous releases. Another alternative to computing mbrc and mreadtim is to force FTS in serial mode to allow the optimizer to gather the data."

And the 11g documentation says this:
"If Oracle Database cannot gather or validate gathered mbrc or mreadtim values, but has gathered sreadtim and cpuspeed values, then only the sreadtim and cpuspeed values are used for costing. In this case, the optimizer uses the value of the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT to cost a full table scan. However, if DB_FILE_MULTIBLOCK_READ_COUNT is not set or is set to 0 (zero), then the optimizer uses a value of 8 for costing."

But when testing this it looked like that if either MBRC or MREADTIM was missing then the optimizer simply reverted to the available NOWORKLOAD system statistics (Note that this applies to 10g and later; I'll show in the next part of the series what happens in 9i since things are different there).

Note in order to have the optimizer accept the WORKLOAD system statistic the MREADTIM needs to greater than the SREADTIM. If your multi-block read requests are served from a cache or your storage system performs aggresssive read-aheads the measured MREADTIM can be less than the SREADTIM. In this case you might need to adjust the MREADTIM manually using the GET_SYSTEM_STATS/SET_SYSTEM_STATS API, which will be covered below.

One interesting oddity showed up when MBRC was available but MREADTIM was missing or not greater than SREADTIM: In that case it looks like that the NOWORKLOAD statistics use the MBRC set in their calculations for synthesizing the MREADTIM and calculating the full table scan cost. This makes sense but is interesting mixture of NOWORKLOAD and WORKLOAD system statistics.

The following test case shows how to gather WORKLOAD system statistics, and how to correct manually a MREADTIM value gathered too low.


SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:02.44
SQL>
SQL> create table t1
2 pctfree 99
3 pctused 1
4 -- tablespace test_2k
5 -- tablespace test_4k
6 tablespace test_8k
7 -- tablespace test_16k
8 as
9 with generator as (
10 select --+ materialize
11 rownum id
12 from all_objects
13 where rownum <= 3000
14 )
15 select
16 /*+ ordered use_nl(v2) */
17 rownum id,
18 trunc(100 * dbms_random.normal) val,
19 rpad('x',100) padding
20 from
21 generator v1,
22 generator v2
23 where
24 rownum <= 10000
25 ;

Table created.

Elapsed: 00:00:02.27
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.60
SQL>
SQL> begin
2 dbms_stats.delete_system_stats;
3 dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
4 end;
5 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.07
SQL>
SQL> alter session set "_table_scan_cost_plus_one" = false;

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> exec dbms_stats.gather_system_stats('START')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.12
SQL>
SQL> begin
2 for i in 1..10 loop
3 for rec in (
4 select
5 max(val)
6 from
7 t1
8 ) loop
9 execute immediate 'alter system flush buffer_cache';
10 end loop;
11 end loop;
12 for rec in (
13 select /*+ use_nl(a t1) */ max(val) from t1,
14 (
15 select /*+ no_merge no_eliminate_oby */
16 rowid as row_id
17 from
18 t1
19 order by
20 dbms_random.value
21 ) a
22 where a.row_id = t1.rowid
23 ) loop
24 null;
25 end loop;
26 end;
27 /

PL/SQL procedure successfully completed.

Elapsed: 00:01:17.73
SQL>
SQL> exec dbms_stats.gather_system_stats('STOP')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.74
SQL>
SQL> begin
2 dbms_stats.set_system_stats('CPUSPEED',1000000);
3 end;
4 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> declare
2 s_status varchar2(200);
3 dt_dstart date;
4 dt_dstop date;
5 n_pvalue number;
6 begin
7 dbms_stats.get_system_stats (
8 s_status,
9 dt_dstart,
10 dt_dstop,
11 'MREADTIM',
12 n_pvalue);
13 dbms_output.put_line('Status: ' || s_status);
14 dbms_output.put_line('Dstart: ' || to_char(dt_dstart, 'DD.MM.YYYY HH24:MI:SS'));
15 dbms_output.put_line('Dstop : ' || to_char(dt_dstop , 'DD.MM.YYYY HH24:MI:SS'));
16 dbms_output.put_line('Value : ' || to_char(n_pvalue, 'TM'));
17 dbms_stats.set_system_stats('MREADTIM', 10 * n_pvalue);
18 end;
19 /
Status: COMPLETED
Dstart: 03.05.2009 13:24:00
Dstop : 03.05.2009 13:24:00
Value : 1.293

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.11
SQL>
SQL> column sname format a20
SQL> column pname format a20
SQL> column pval2 format a20
SQL>
SQL> select
2 sname
3 , pname
4 , pval1
5 , pval2
6 from
7 sys.aux_stats$;

SNAME PNAME PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_INFO DSTART 05-03-2009 13:24
SYSSTATS_INFO DSTOP 05-03-2009 13:24
SYSSTATS_INFO FLAGS 1
SYSSTATS_MAIN CPUSPEEDNW 1000000
SYSSTATS_MAIN IOSEEKTIM 10
SYSSTATS_MAIN IOTFRSPEED 4096
SYSSTATS_MAIN SREADTIM 8.021
SYSSTATS_MAIN MREADTIM 12.93
SYSSTATS_MAIN CPUSPEED 1000000
SYSSTATS_MAIN MBRC 8
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN SLAVETHR

13 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> explain plan for
2 select
3 max(val)
4 from
5 t1;

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2016 (0)| 00:00:17 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2016 (0)| 00:00:17 |
---------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.17
SQL>
SQL> spool off


Using the known formula we can confirm the cost calculation above:

Blocks below HWM / MBRC: Number of multi-block read requests required to scan the segment

Number of multi-block read requests * MREADTIM = time it takes to perform these number of read requests in milliseconds.

Finally this is divided by SREADTIM to arrive at the known unit used for cost representation which is number of single read requests.

10,000 / 8 = 1,250 multi-block read requests

1,250 * 12.93 = 16,162.5 ms execution time (which is shown as 17 seconds in the plan by the way)

16,162.5 / 8.021 = 2,015.02 (which is shown as 2,016 in the plan)

Manually writing and maintaining system statistics

Using the DBMS_STATS.GET_SYSTEM_STATS / SET_SYSTEM_STATS API you can write your own set of system statistics for both NOWORKLOAD and WORKLOAD values.

You can use DBMS_STATS.DELETE_SYSTEM_STATS to remove the system statistics, which will activate the default NOWORKLOAD system statistics in 10g and disable CPU costing in 9i.

You can use DBMS_STATS.EXPORT_SYSTEM_STATS / IMPORT_SYSTEM_STATS to export and import system statistics to a user statistics table created with DBMS_STATS.CREATE_STAT_TABLE.

Note that DBMS_STATS.GATHER_SYSTEM_STATS when used with an user stats table (created with DBMS_STATS.CREATE_STAT_TABLE) behaves differently than e.g. DBMS_STATS.GATHER_TABLE_STATS: Whereas object related statistics always go to the data dictionary and you only have the option to save the current statistics to the user stats table before replacing them with the new values, GATHER_SYSTEM_STATS actually writes the system statistics into the user stats table and doesn't change the actual system statistics if you're supplying a user stats table name.

For further discussion how you could use system statistics see Jonathan Lewis' thoughts about this topic:

Part 1
Part 2
Part 3

The next part of the series will cover the usage of system statistics in 9i, highlight some quirks and oddities observed and finally show what happens if you attempt to use multiple block sizes for "tuning" purposes.

Saturday, April 25, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 1

Forward to part 2

This is the first part of a series of posts that cover one of the fundamentals of the cost based optimizer in 9i and later. Understanding how the different system statistics modes work is crucial in making the most out of the cost based optimizer, therefore I'll attempt to provide some detailed explanations and samples about the formulas and arithmetics used. Finally I'll show (again) that using multiple block sizes for "tuning" purposes is a bad idea in general, along with detailed examples why I think this is so.

One of the deficiencies of the traditional I/O based costing was that it simply counted the number of I/O requests making no differentation between single-block I/O and multi-block I/O.

System statistics were introduced in Oracle 9i to allow the cost based optimizer to take into account that single-block I/Os and multi-block I/Os should be treated differently in terms of costing and to include a CPU component in the cost calculation.

The system statistics tell the cost based optimizer (CBO) among other things the time it takes to perform a single block read request and a multi-block read request. Given this information the optimizer ought to be able to come to estimates that better fit the particular environment where the database is running on and additionally use an appropriate costing for multi-block read requests that usually take longer than single block read requests. Given the information about the time it takes to perform the read requests the cost calculated can be turned into a time estimate.

The cost calculated with system statistics is still expressed in the same units as with traditional I/O based costing, which is in units of single-block read requests.

Although the mode using system statistics is also known as "CPU costing" despite the name the system statistics have the most significant impact on the I/O costs calculated for full table scans due to the different measure MREADTIM used for multi-block read requests.

Starting with Oracle 10g you have actually the choice of three different modes of system statistics also known as CPU costing:

1. Default NOWORKLOAD system statistics
2. Gathered NOWORKLOAD system statistics
3. Gathered WORKLOAD system statistics

The important point to understand here is that starting with Oracle 10g system statistics are enabled by default (using the default NOWORKLOAD system statistics) and you can only disable them by either downgrading your optimizer (using the OPTIMIZER_FEATURES_ENABLE parameter) or using undocumented parameters or hints ("_optimizer_cost_model" respectively the CPU_COSTING and NOCPU_COSTING hints).

This initial part of the series will focus on the default NOWORKLOAD system statistics introduced with Oracle 10g.

Default NOWORKLOAD system statistics

The default NOWORKLOAD system statistics measure only the CPU speed (CPUSPEEDNW), the two other remaining values used for NOWORKLOAD system statistics IOSEEKTIM (seek time) and IOTFRSPEED (transfer speed) are using default values (10 milliseconds seek time and 4096 bytes per millisecond transfer speed).

Using these default values for the I/O part the SREADTIM (single-block I/O read time) and MREADTIM (multi-block I/O read time) values are synthesized for cost calculation by applying the following formula:

SREADTIM = IOSEEKTIM + db_block_size / IOTFRSPEED

MREADTIM = IOSEEKTIM + mbrc * db_block_size / IOTFRSPEED

where "db_block_size" represents your database standard block size in bytes and "mbrc" is either the value of "db_file_multiblock_read_count" if it has been set explicitly, or a default of 8 if left unset. From 10.2 on this is controlled internally by the undocumented parameter "_db_file_optimizer_read_count". This means that in 10.2 and later the "mbrc" used by the optimizer to calculate the cost can be different from the "mbrc" actually used at runtime when performing multi-block read requests. If you leave the "db_file_multiblock_read_count" unset in 10.2 and later then Oracle uses a default of 8 for cost calculation but uses the largest possible I/O request size depending on the platform, which is usually 1MB (e.g. 128 blocks when using a block size of 8KB). In 10.2 and later this is controlled internally by the undocumented parameter "_db_file_exec_read_count".

Assuming a default block size of 8KB (8192 bytes) and "db_file_multiblock_read_count" left unset, this results in the following calculation:

SREADTIM = 10 + 8192 / 4096 = 10 + 2 = 12ms

MREADTIM = 10 + 8 * 8192 / 4096 = 10 + 16 = 26ms

These values will then be used to calculate the I/O cost of single block and multi-block read requests according to the execution plan (number of single-block reads + number of multi-block reads * MREADTIM / SREADTIM), which means that the I/O cost with system statistics aka. CPU costing is expressed in units of single block reads.

You can derive from above formula that with system statistics the cost of a full table scan operation is going to be more expensive approximately by the factor MREADTIM / SREADTIM compared to the traditional I/O based costing used in pre-10g by default, therefore system statistics usually tend to favor index access a bit more.

Note that above factor MREADTIM / SREADTIM is not entirely correct since the traditional I/O costing introduces a efficiency reduction factor when using higher MBRC settings, presumably to reflect that the larger the number of blocks per I/O request the higher the possibility that it won't be possible to use that large number of blocks per I/O request due to blocks already being in the buffer cache or hitting extent boundaries.

So with a MBRC setting of 8 the adjusted MBRC used for calculation is actually 6.59. Using e.g. a very high setting of 128 for the MBRC will actually use 40.82 for calculation. So the higher the setting the more the MRBC used for calculation will be reduced.

The following test case shall demonstrate the difference between traditional I/O costing, CPU costing and the factor MREADTIM / SREADTIM when using different "db_file_multiblock_read_count" settings. The test case was run against 10.2.0.4 Win32.

Note that the test case removes your current system statistics so you should be cautious if you have non-default system statistics at present in your database.

Furthermore the test case assumes a 8KB database default block size, and a locally managed tablespace with 1MB uniform extent size using manual segment space management (no ASSM).


drop table t1;

-- Create a table consisting of 10,000 blocks / 1 row per block
-- in a 8KB tablespace with manual segment space management (no ASSM)
create table t1
pctfree 99
pctused 1
-- tablespace test_2k
-- tablespace test_4k
tablespace test_8k
-- tablespace test_16k
as
with generator as (
select --+ materialize
rownum id
from all_objects
where rownum <= 3000
)
select
/*+ ordered use_nl(v2) */
rownum id,
trunc(100 * dbms_random.normal) val,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 10000
;

begin
dbms_stats.gather_table_stats(
user,
't1',
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1'
);
end;
/

-- Use default NOWORKLOAD system statistics
-- for test but ignore CPU cost component
-- by using an artificially high CPU speed
begin
dbms_stats.delete_system_stats;
dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
end;
/

-- In order to verify the formula against the
-- optimizer calculations
-- don't increase the table scan cost by one
-- which is done by default from 9i on
alter session set "_table_scan_cost_plus_one" = false;

alter session set db_file_multiblock_read_count = 8;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 26
-- MREADTIM/SREADTIM = 26/12 = 2.16
-- Factor CPU Costing / traditional I/O costing
-- 2,709/1,518 = 1.78
-- MBRC = 8, adjusted MBRC = 10,000 / 1,518 = 6.59
-- 8/6.59 = 1.21
-- 2.16 / 1.21 = 1.78

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1518 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1518 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2709 (0)| 00:00:33 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2709 (0)| 00:00:33 |
---------------------------------------------------------------------------

alter session set db_file_multiblock_read_count = 16;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 42
-- MREADTIM/SREADTIM = 42/12 = 3.5
-- Factor CPU Costing / traditional I/O costing
-- 2,188/962 = 2.27
-- MBRC = 16, adjusted MBRC = 10,000 / 962 = 10.39
-- 16/10.39 = 1.54
-- 3.5 / 1.54 = 2.27

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 962 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 962 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2188 (0)| 00:00:27 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2188 (0)| 00:00:27 |
---------------------------------------------------------------------------

alter session set db_file_multiblock_read_count = 32;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 74
-- MREADTIM/SREADTIM = 74/12 = 6.16
-- Factor CPU Costing / traditional I/O costing
-- 1,928/610 = 3.16
-- MBRC = 32, adjusted MBRC = 10,000 / 610 = 16.39
-- 32/16.39 = 1.95
-- 6.16 / 1.95 = 3.16

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 610 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 610 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1928 (0)| 00:00:24 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1928 (0)| 00:00:24 |
---------------------------------------------------------------------------

alter session set db_file_multiblock_read_count = 64;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 138
-- MREADTIM/SREADTIM = 138/12 = 11.5
-- Factor CPU Costing / traditional I/O costing
-- 1,798/387 = 4.64
-- MBRC = 64, adjusted MBRC = 10,000 / 387 = 25.84
-- 64/25.84 = 2.48
-- 11.5 / 2.48 = 4.64

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 387 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 387 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1798 (0)| 00:00:22 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1798 (0)| 00:00:22 |
---------------------------------------------------------------------------

alter session set db_file_multiblock_read_count = 128;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 266
-- MREADTIM/SREADTIM = 266/12 = 22.16
-- Factor CPU Costing / traditional I/O costing
-- 1,732/245 = 7.07
-- MBRC = 128, adjusted MBRC = 10,000 / 245 = 40.82
-- 128/40.82 = 3.13
-- 22.16 / 3.13 = 7.07

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 245 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 245 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1732 (0)| 00:00:21 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1732 (0)| 00:00:21 |
---------------------------------------------------------------------------


So as you can see the I/O costs for a full table scan are significantly different when using default NOWORKLOAD system statistics. You can also see that the SREADTIM and MREADTIM values derived are quite different when using different "db_file_multiblock_read_count" settings. Furthermore the difference between traditional I/O based costing and the CPU costing is not the factor MREADTIM / SREADTIM as suggested by the formula, but is reduced by the adjustment applied to the MBRC when using traditional I/O costing.

The next part of the series will cover the remaining available System Statistics mode.