Sunday, May 3, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 2

Back to part 1 Forward to part 3

Before heading on to the remaining modes of system statistics, let's summarize what has been observed in part 1 regarding the default NOWORKLOAD system statistics in 10g and later. The following table shows what the test case from the previous post demonstrated:

Table 1: 8KB MSSM locally managed tablespace 10,000 blocks table segment
default NOWORKLOAD system statistics:


MBRC|SREADTIM|MREADTIM|MREADTIM/SREADTIM|NOCPU cost|adjusted MBRC|CPU cost|CPU/NOCPU cost
----|--------|--------|-----------------|----------|-------------|--------|--------------
8 |12 | 26 | 2.16 |1,518 | 6.59 |2,709 |1.78
16 |12 | 42 | 3.5 | 962 |10.39 |2,188 |2.27
32 |12 | 74 | 6.16 | 610 |16.39 |1,928 |3.16
64 |12 |138 |11.5 | 387 |25.84 |1,798 |4.64
128 |12 |266 |22.16 | 245 |40.82 |1,732 |7.07


If you happen to have a 16KB default blocksize the results would look like the following. Note that the table is now only 5,000 blocks in size, and the SREADTIM is now a bit longer (10+16384/4096=14ms instead of 10+8192/4096=12ms) therefore the 16KB blocksize calculation makes the full table scan look a bit cheaper to the optimizer when using the default NOWORKLOAD system statistics.

Table 2: 16KB MSSM locally managed tablespace 5,000 blocks table segment
default NOWORKLOAD system statistics:


MBRC|SREADTIM|MREADTIM|MREADTIM/SREADTIM|NOCPU cost|adjusted MBRC|CPU cost|CPU/NOCPU cost
----|--------|--------|-----------------|----------|-------------|--------|--------------
4 |14 | 26 | 1.86 |1,119 | 4.17 |2,322 |2.08
8 |14 | 42 | 3.0 | 759 | 6.59 |1,875 |2.47
16 |14 | 74 | 5.3 | 481 |10.39 |1,652 |3.43
32 |14 |138 | 9.86 | 305 |16.39 |1,540 |5.05
64 |14 |266 |19.0 | 194 |25.84 |1,485 |7.65


Gathered NOWORKLOAD system statistics

If you gather NOWORKLOAD system statistics using DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD') the values IOSEEKTIM and IOTFRSPEED will actually get measured and used accordingly.

The remaining calculations how to derive the SREADTIM and MREADTIM values correspond to what has been outlined above.

Gathering NOWORKLOAD statistics:

exec DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD')

This actually gathers the IOTFRSPEED and IOSEEKTIM values in addition to CPUSPEEDNW rather than using the default values of 4096 and 10.

In 10g and later this may take from a couple of seconds to a couple of minutes depending on the size of your database. Note that this puts additional load onto your system while gathering the NOWORKLOAD system statistics since it submits random reads against all data files.

The following test case shows the different I/O cost calculations when using default NOWORKLOAD system statistics and custom gathered NOWORKLOAD system statistics. It creates again the 10,000 blocks table in a 8KB default block size locally managed tablespace using manual segment space management:


SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.03
SQL>
SQL> create table t1
2 pctfree 99
3 pctused 1
4 -- tablespace test_2k
5 -- tablespace test_4k
6 tablespace test_8k
7 -- tablespace test_16k
8 as
9 with generator as (
10 select --+ materialize
11 rownum id
12 from all_objects
13 where rownum <= 3000
14 )
15 select
16 /*+ ordered use_nl(v2) */
17 rownum id,
18 trunc(100 * dbms_random.normal) val,
19 rpad('x',100) padding
20 from
21 generator v1,
22 generator v2
23 where
24 rownum <= 10000
25 ;

Table created.

Elapsed: 00:00:02.22
SQL>
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.29
SQL> -- default NOWORKLOAD system statistics
SQL> -- ignore CPU costs for the moment
SQL> begin
2 dbms_stats.delete_system_stats;
3 dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
4 end;
5 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> column sname format a20
SQL> column pname format a20
SQL> column pval2 format a20
SQL>
SQL> select
2 sname
3 , pname
4 , pval1
5 , pval2
6 from
7 sys.aux_stats$;

SNAME PNAME PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_INFO DSTART 04-26-2009 14:21
SYSSTATS_INFO DSTOP 04-26-2009 14:21
SYSSTATS_INFO FLAGS 1
SYSSTATS_MAIN CPUSPEEDNW 1000000
SYSSTATS_MAIN IOSEEKTIM 10
SYSSTATS_MAIN IOTFRSPEED 4096
SYSSTATS_MAIN SREADTIM
SYSSTATS_MAIN MREADTIM
SYSSTATS_MAIN CPUSPEED
SYSSTATS_MAIN MBRC
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN SLAVETHR

13 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> alter session set "_table_scan_cost_plus_one" = false;

Session altered.

Elapsed: 00:00:00.02
SQL>
SQL> explain plan for
2 select
3 max(val)
4 from
5 t1;

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2709 (0)| 00:00:33 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2709 (0)| 00:00:33 |
---------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.08
SQL> -- gather NOWORKLOAD system statistics
SQL> exec dbms_stats.gather_system_stats('NOWORKLOAD')

PL/SQL procedure successfully completed.

Elapsed: 00:00:14.43
SQL> -- ignore CPU costs for the moment
SQL> begin
2 dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
3 end;
4 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.02
SQL> column sname format a20
SQL> column pname format a20
SQL> column pval2 format a20
SQL>
SQL> select
2 sname
3 , pname
4 , pval1
5 , pval2
6 from
7 sys.aux_stats$;

SNAME PNAME PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_INFO DSTART 04-26-2009 14:21
SYSSTATS_INFO DSTOP 04-26-2009 14:21
SYSSTATS_INFO FLAGS 1
SYSSTATS_MAIN CPUSPEEDNW 1000000
SYSSTATS_MAIN IOSEEKTIM 14.226
SYSSTATS_MAIN IOTFRSPEED 32517.754
SYSSTATS_MAIN SREADTIM
SYSSTATS_MAIN MREADTIM
SYSSTATS_MAIN CPUSPEED
SYSSTATS_MAIN MBRC
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN SLAVETHR

13 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> explain plan for
2 select
3 max(val)
4 from
5 t1;

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1403 (0)| 00:00:21 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1403 (0)| 00:00:21 |
---------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.04
SQL>
SQL> spool off


Based on the gathered IOSEEKTIM and IOTFRSPEED values the I/O cost calculated is significantly different.

Applying the known formulas we can reproduce the calculated figures:

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED

SREADTIM = 14.226 + 8192 / 32,517.754 = 14.478

MREADTIM = IOSEEKTIM + MBRC * DB_BLOCK_SIZE / IOTFRSPEED

MREADTIM = 14.226 + 8 * 8192 / 32,517.754 = 16.241

FTS cost = Blocks below HWM / MBRC * MREADTIM / SREADTIM

FTS cost = 10,000 / 8 * 16.241 / 14.478 = 1,403

Gathered WORKLOAD system statistics

Gathering WORKLOAD system statistics measures a different set of values, including the actual MBRC, SREADTIM and MREADTIM values. The cost calculation therefore doesn't use the synthesized SREADTIM and MREADTIM values any longer, nor does it use the "_db_file_optimizer_read_count" parameter in 10g and later, but uses simply the measured values.

Therefore the I/O costs calculated with WORKLOAD system statistics are not dependent on the "db_file_multiblock_read_count" value used, but the important point to keep in mind is that the gathered WORKLOAD system statistics are based on the "db_file_multiblock_read_count" (in 10g and later on the internal parameter "_db_file_exec_read_count") value used at runtime, so the values measured are obviously influenced by this setting ("_db_file_exec_read_count" equals "db_file_multiblock_read_count" if this has been set and the underscore parameter hasn't been modified).

As already mentioned in part 1 Oracle has introduced with Oracle 10.2 that if the "db_file_multiblock_read_count" is left unset different values for cost calculation and at execution time will be used (8 for calculation and the largest possible I/O size at runtime, usually 1MB on most platforms), so that points in general into the right direction since it allows the calibration code to work out the largest MBRC possible at runtime that can be achieved. Note that Christian Antognini doesn't agree to this approach in his book "Troubleshooting Oracle Performance" where he advices to manually work out the "optimal" MBRC setting running suitable I/O tests.

Note that in 10g and later the runtime engine still uses the "_db_file_exec_read_count", regardless of the MBRC used to calculate the cost.

If you run the following code snippet in 10g and later and check the resulting trace files, you'll see this confirmed:


alter session set tracefile_identifier = 'exec_count_16';

alter session set "_db_file_exec_read_count" = 16;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 8';

select max(val)
from t1;

alter session set events '10046 trace name context off';

alter session set tracefile_identifier = 'exec_count_128';

alter session set "_db_file_exec_read_count" = 128;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 8';

select max(val)
from t1;

alter session set events '10046 trace name context off';


The resulting trace files look like the following:

The 16 blocks setting:


.
.
.
WAIT #2: nam='db file scattered read' ela= 1732 file#=8 block#=12058 blocks=16 obj#=62088 tim=69006657688
WAIT #2: nam='db file scattered read' ela= 1725 file#=8 block#=12074 blocks=16 obj#=62088 tim=69006659628
WAIT #2: nam='db file scattered read' ela= 1726 file#=8 block#=12090 blocks=16 obj#=62088 tim=69006661566
.
.
.


The 128 blocks setting:


.
.
.
WAIT #2: nam='db file scattered read' ela= 13842 file#=8 block#=12169 blocks=128 obj#=62088 tim=69008775308
WAIT #2: nam='db file scattered read' ela= 15513 file#=8 block#=12297 blocks=128 obj#=62088 tim=69008793460
WAIT #2: nam='db file scattered read' ela= 26437 file#=8 block#=12425 blocks=128 obj#=62088 tim=69008822434
.
.
.


Gathering WORKLOAD system statistics:

exec DBMS_STATS.GATHER_SYSTEM_STATS('START')
-- some significant (ideally "representative") workload needs to be performed
-- otherwise some or all of the measured values will be missing
exec DBMS_STATS.GATHER_SYSTEM_STATS('STOP')

or

exec DBMS_STATS.GATHER_SYSTEM_STATS('INTERVAL', )

Note that gathering workload system statistics doesn't put additional load onto your system, since the values are derived from the delta in statistics already maintained by Oracle during database activity.

Furthermore if your workload doesn't use "db file scattered read" i.e. multi-block reads that are working with the buffer cache, then you might end up with WORKLOAD system statistics that are missing the MBRC and MREADTIM component (null values). This might happen if you e.g. use only index access paths with table row random lookups by ROWID and/or all your tablescans are all going parallel, or in 11g use serial direct reads that bypass the buffer cache (which can be activated in pre-11g using the hidden parameter "_serial_direct_read").

The same applies to "db file sequential read" i.e. single-block reads, if you only perform multi-block reads in your workload then the SREADTIM information might be missing from the gathered statistics.

Although the official documentation says in 10.2 for that case the following:
"During the gathering process of workload statistics, it is possible that mbrc and mreadtim will not be gathered if no table scans are performed during serial workloads, as is often the case with OLTP systems. On the other hand, FTS occur frequently on DSS systems but may run parallel and bypass the buffer cache. In such cases, sreadtim will still be gathered since index lookup are performed using the buffer cache. If Oracle cannot gather or validate gathered mbrc or mreadtim, but has gathered sreadtim and cpuspeed, then only sreadtim and cpuspeed will be used for costing. FTS cost will be computed using analytical algorithm implemented in previous releases. Another alternative to computing mbrc and mreadtim is to force FTS in serial mode to allow the optimizer to gather the data."

And the 11g documentation says this:
"If Oracle Database cannot gather or validate gathered mbrc or mreadtim values, but has gathered sreadtim and cpuspeed values, then only the sreadtim and cpuspeed values are used for costing. In this case, the optimizer uses the value of the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT to cost a full table scan. However, if DB_FILE_MULTIBLOCK_READ_COUNT is not set or is set to 0 (zero), then the optimizer uses a value of 8 for costing."

But when testing this it looked like that if either MBRC or MREADTIM was missing then the optimizer simply reverted to the available NOWORKLOAD system statistics (Note that this applies to 10g and later; I'll show in the next part of the series what happens in 9i since things are different there).

Note in order to have the optimizer accept the WORKLOAD system statistic the MREADTIM needs to greater than the SREADTIM. If your multi-block read requests are served from a cache or your storage system performs aggresssive read-aheads the measured MREADTIM can be less than the SREADTIM. In this case you might need to adjust the MREADTIM manually using the GET_SYSTEM_STATS/SET_SYSTEM_STATS API, which will be covered below.

One interesting oddity showed up when MBRC was available but MREADTIM was missing or not greater than SREADTIM: In that case it looks like that the NOWORKLOAD statistics use the MBRC set in their calculations for synthesizing the MREADTIM and calculating the full table scan cost. This makes sense but is interesting mixture of NOWORKLOAD and WORKLOAD system statistics.

The following test case shows how to gather WORKLOAD system statistics, and how to correct manually a MREADTIM value gathered too low.


SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:02.44
SQL>
SQL> create table t1
2 pctfree 99
3 pctused 1
4 -- tablespace test_2k
5 -- tablespace test_4k
6 tablespace test_8k
7 -- tablespace test_16k
8 as
9 with generator as (
10 select --+ materialize
11 rownum id
12 from all_objects
13 where rownum <= 3000
14 )
15 select
16 /*+ ordered use_nl(v2) */
17 rownum id,
18 trunc(100 * dbms_random.normal) val,
19 rpad('x',100) padding
20 from
21 generator v1,
22 generator v2
23 where
24 rownum <= 10000
25 ;

Table created.

Elapsed: 00:00:02.27
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.60
SQL>
SQL> begin
2 dbms_stats.delete_system_stats;
3 dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
4 end;
5 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.07
SQL>
SQL> alter session set "_table_scan_cost_plus_one" = false;

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> exec dbms_stats.gather_system_stats('START')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.12
SQL>
SQL> begin
2 for i in 1..10 loop
3 for rec in (
4 select
5 max(val)
6 from
7 t1
8 ) loop
9 execute immediate 'alter system flush buffer_cache';
10 end loop;
11 end loop;
12 for rec in (
13 select /*+ use_nl(a t1) */ max(val) from t1,
14 (
15 select /*+ no_merge no_eliminate_oby */
16 rowid as row_id
17 from
18 t1
19 order by
20 dbms_random.value
21 ) a
22 where a.row_id = t1.rowid
23 ) loop
24 null;
25 end loop;
26 end;
27 /

PL/SQL procedure successfully completed.

Elapsed: 00:01:17.73
SQL>
SQL> exec dbms_stats.gather_system_stats('STOP')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.74
SQL>
SQL> begin
2 dbms_stats.set_system_stats('CPUSPEED',1000000);
3 end;
4 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> declare
2 s_status varchar2(200);
3 dt_dstart date;
4 dt_dstop date;
5 n_pvalue number;
6 begin
7 dbms_stats.get_system_stats (
8 s_status,
9 dt_dstart,
10 dt_dstop,
11 'MREADTIM',
12 n_pvalue);
13 dbms_output.put_line('Status: ' || s_status);
14 dbms_output.put_line('Dstart: ' || to_char(dt_dstart, 'DD.MM.YYYY HH24:MI:SS'));
15 dbms_output.put_line('Dstop : ' || to_char(dt_dstop , 'DD.MM.YYYY HH24:MI:SS'));
16 dbms_output.put_line('Value : ' || to_char(n_pvalue, 'TM'));
17 dbms_stats.set_system_stats('MREADTIM', 10 * n_pvalue);
18 end;
19 /
Status: COMPLETED
Dstart: 03.05.2009 13:24:00
Dstop : 03.05.2009 13:24:00
Value : 1.293

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.11
SQL>
SQL> column sname format a20
SQL> column pname format a20
SQL> column pval2 format a20
SQL>
SQL> select
2 sname
3 , pname
4 , pval1
5 , pval2
6 from
7 sys.aux_stats$;

SNAME PNAME PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_INFO DSTART 05-03-2009 13:24
SYSSTATS_INFO DSTOP 05-03-2009 13:24
SYSSTATS_INFO FLAGS 1
SYSSTATS_MAIN CPUSPEEDNW 1000000
SYSSTATS_MAIN IOSEEKTIM 10
SYSSTATS_MAIN IOTFRSPEED 4096
SYSSTATS_MAIN SREADTIM 8.021
SYSSTATS_MAIN MREADTIM 12.93
SYSSTATS_MAIN CPUSPEED 1000000
SYSSTATS_MAIN MBRC 8
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN SLAVETHR

13 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> explain plan for
2 select
3 max(val)
4 from
5 t1;

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2016 (0)| 00:00:17 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2016 (0)| 00:00:17 |
---------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.17
SQL>
SQL> spool off


Using the known formula we can confirm the cost calculation above:

Blocks below HWM / MBRC: Number of multi-block read requests required to scan the segment

Number of multi-block read requests * MREADTIM = time it takes to perform these number of read requests in milliseconds.

Finally this is divided by SREADTIM to arrive at the known unit used for cost representation which is number of single read requests.

10,000 / 8 = 1,250 multi-block read requests

1,250 * 12.93 = 16,162.5 ms execution time (which is shown as 17 seconds in the plan by the way)

16,162.5 / 8.021 = 2,015.02 (which is shown as 2,016 in the plan)

Manually writing and maintaining system statistics

Using the DBMS_STATS.GET_SYSTEM_STATS / SET_SYSTEM_STATS API you can write your own set of system statistics for both NOWORKLOAD and WORKLOAD values.

You can use DBMS_STATS.DELETE_SYSTEM_STATS to remove the system statistics, which will activate the default NOWORKLOAD system statistics in 10g and disable CPU costing in 9i.

You can use DBMS_STATS.EXPORT_SYSTEM_STATS / IMPORT_SYSTEM_STATS to export and import system statistics to a user statistics table created with DBMS_STATS.CREATE_STAT_TABLE.

Note that DBMS_STATS.GATHER_SYSTEM_STATS when used with an user stats table (created with DBMS_STATS.CREATE_STAT_TABLE) behaves differently than e.g. DBMS_STATS.GATHER_TABLE_STATS: Whereas object related statistics always go to the data dictionary and you only have the option to save the current statistics to the user stats table before replacing them with the new values, GATHER_SYSTEM_STATS actually writes the system statistics into the user stats table and doesn't change the actual system statistics if you're supplying a user stats table name.

For further discussion how you could use system statistics see Jonathan Lewis' thoughts about this topic:

Part 1
Part 2
Part 3

The next part of the series will cover the usage of system statistics in 9i, highlight some quirks and oddities observed and finally show what happens if you attempt to use multiple block sizes for "tuning" purposes.

5 comments:

  1. Randolf,

    As I said in a comment on your first post I think your "adjusted MBRC" formula is wrong. It should be:
    Estimated MBRC = MBRC * SREADTIM / MREADTIM

    In your post you have:
    FTS cost = Blocks below HWM / MBRC * MREADTIM / SREADTIM

    I assume the double divide by SREADTIM means that it ends up being a multiply i.e. when you divide Blocks by (MBRC * MREADTIM / SREADTIM), the SREADTIM ends up on top, multiplying against Blocks.

    Your calculation seems wrong to me. You state:
    FTS cost = 10,000 / 8 * 16.241 / 14.478 = 1,403

    But breaking it into steps I get a completely different result:
    10000 / (8 * 16.241) = 10000 / 129.928 = 76.966
    76.966 * 14.478 = 1114.31

    This is not 1403 as you state.

    Using my formula instead for the Estimated MBRC, then:
    EMBRC = 8 * 14.478 / 16.241 = 7.132
    FTS Cost = 10000 / 7.132 = 1402.13

    Which is pretty close to the reported cost of 1403 by Oracle itself. In fact I did get 1403 when I used a calculator, and did not round to 3 decimal places.

    I hope this helps with your understanding of how the optimiser does some of its costs. I'll try and get around to posting my findings sometime.

    John
    Database Performance Blog

    ReplyDelete
  2. John,

    as already mentioned in the part 1 comments, we're talking here about different things. The "adjusted" MBRC refers to the traditional I/O based costing.

    I can't follow your example why the calculation you've mentioned should be wrong:

    Doing step by step and not inversing the meaning by introducing brackets:

    10,000 / 8 = 1,250

    1,250 * 16.241 = 20,301.25

    20,301.25 / 14.478 = 1,402.2

    which shown as 1,403 in the cost.

    Regarding your "estimated MBRC", can you quote where you have this from Christian's book?

    Best Regards,
    Randolf

    ReplyDelete
  3. Randolf,

    I think we do agree on the formula, and just disagree on how best to write it down. That is why I often use brackets to make such formula more explicitly clear.

    Your calculation of 1402 is the same as mine - so we do agree. Maybe you would consider modifying how your formula is written.

    Your 3 step calculation can be written as:
    1. Blocks / MBRC
    2. (Blocks / MBRC) * MREADTIM
    3. ((Blocks / MBRC) * MREADTIM) / SREADTIM

    I am using the brackets for clarity. Expanding this it comes out to be:

    (Blocks * MREADTIM) / (MBRC * SREADTIM)

    (10000 * 16.241) / (8 * 14.478) = 162410 / 115.824 = 1402.21

    And you can see how my "Estimated MBRC" formula is divided into Blocks, resulting in the expanded formula just shown. So we do agree on the formula and calculation, but maybe not on the best way to write it down.

    In Christian's book on page 177 is the following formula. I am modifying it to include brackets, because in the book it is formatted over two lines i.e. blocks appears over mbrc, and mreadtim appears over sreadtim.

    io_cost ~= (blocks / mbrc) * (mreadtim / sreadtim)

    It is just that when I see mathematical formulae it is sometimes difficult to know if the calculation is performed in a strictly left to right fashion, which is what I think you meant, or whether things group in a certain way i.e. the value of that before the '/' is divided by the value of everything after it.

    Hope that all makes sense.

    John

    ReplyDelete
  4. John,

    thanks for the clarification.

    I didn't use brackets since I was under the impression it is then clear that the evaluation is supposed to be strictly from left to right and I don't see why this shouldn't be the case given the operators of the formula.

    I'm not sure what you attempt to express using your "estimated MBRC" value.

    The meaning of the formula used to calculate the I/O cost of a full table scan with system statistics available is the following as outlined in part 1:

    number of multi-block reads * MREADTIM / SREADTIM

    Using the first part only:

    number of multi-block reads * MREADTIM

    We can already deduce the estimated time in ms it's going to take to complete the full table scan.

    The "number of multi-block reads" is simply "blocks below HWM" divided by the MBRC (And by the way this is where in traditional I/O costing the "adjusted MBRC" comes into the picture and the calculation simply ends at that point for traditional I/O costing).

    The time is then divided by SREADTIM to get back to the "traditional" unit used so far for costing in Oracle, which is units of single block read requests, see e.g. Jonathan Lewis quick note on this.

    So given this meaning I'm not sure what your "estimated MBRC" is supposed to represent.

    I brought this MREADTIM / SREADTIM factor up because according to the formula it describes the expected difference in costs of a full table scan between traditional I/O costing and CPU costing, but as outlined in part 1, you have to consider the "adjusted MBRC" used for traditional I/O costing to understand why the difference between traditional and CPU I/O cost is not consistent with simply MREADTIM / SREADTIM.

    Regards,
    Randolf

    ReplyDelete
  5. Randolf,

    I now agree with the fact that my "Estimated MBRC" is incorrect i.e. there is no such thing. I think I read 2 different documents and got them mixed up a bit. What I have labelled "Estimated MBRC" is really MBRC adjusted to be in units of single block reads. Which is exactly what you said in your last comment - that all I/Os are costed in units of single block reads. As I said I think I read 2 different documents, one of which had this MRT / SRT adjustment in it, and somehow combined the two together in my mind.

    I think I now have this straight, and have changed my own notes to remove this "Estimated MBRC" thing.

    Thanks for taking the time to respond to my comments and helping clarify what we are both talking about.

    John

    ReplyDelete