Friday, January 15, 2010

CLUSTERING_FACTOR What-If Analysis

In Oracle the clustering factor of an index is a single number that is supposed to represent the correlation between the order of the index and the order of the corresponding table.

If an execution plan contains an index range scan including an access to a table by ROWID, the clustering factor tells the cost-based optimizer how "clustered" or "scattered" the data in the table is with respect to the index - that is on average how likely contiguous rows from an index range scan will point to the same table block. Widely scattered data will require to read a different block for every row returned from the index range scan, in contrast in the case of clustered data many of the contiguous rows from the index will point to the same block, making a significant difference to the cost calculated by the cost-based optimizer (and of course making also a significant difference at actual execution time, in particular when many table blocks have to be read from disk).

As described e.g. in detail by Jonathan Lewis in his Cost-Based Fundamentals book, Chapter 5 (Clustering Factor), there are various scenarios that have significant influence on the clustering factor, and there are scenarios where Oracle actually doesn't get the clustering factor right - in particular when dealing with data that is inserted simultaneously in conjunction with some segment space management method that attempts to spread the inserted data across different blocks to avoid contention. This can happen for instance when using multiple freelists/freelist groups with manual segment space management (MSSM) or automatic segment space management (ASSM) for that matter.

Usually the clustering factor in case of an index range scan with table access involved represents the largest fraction of the cost associated with the operation, therefore indexes with high clustering factors (meaning that the table data is scattered in relation to the index order) tend to be ignored by the cost-based optimizer and different access paths might be favored instead, like full table scans or the usage of different available indexes.

Since this clustering factor is therefore often such a crucial information I have derived a simple query from what DBMS_STATS.GATHER_INDEX_STATS uses (and Jonathan mentions and explains in his book) and that allows to perform a what-if analysis regarding the clustering factor.

It can be used to:

- Validate/correct the clustering factor of an existing index determined by DBMS_STATS
- In particular check if concurrent inserts together with some segment space management lead to an non-representative clustering factor (this is something DBMS_STATS does not support at present and can only be corrected manually using DBMS_STATS.SET_INDEX_STATS)
- Perform various what-if scenarios without the need to actually create/re-create the index, e.g.
- The clustering factor of a new index to be added
- The effect of adding a column to an existing index on the clustering factor
- The effect of changing the column order of an existing index
- The effect of changing an index to an reverse index

Similar information could also be obtained by simply creating the corresponding index, but there are some points to consider here:

Creating an index up to release 10g might lead to unwanted changes in the execution plans if performed on a live system. In 11g the option to create an index as invisible can be used to avoid this, however you still incur the overhead of writing the index structure and allocating physical space, which could be significant depending on the amount of data.

Furthermore the query allows for certain kinds of analysis that is simply not possible by creating an index - more on this later.

The query has this general form:


select
sys_op_countchg(substrb(row_id,1,15), <history>)
[* (100 / %perc)] as clustering_factor
, count(*)
[* (100 / %perc)] as cnt
, count(distinct substrb(row_id,1,15))
[* (100 / %perc)] as blocks
from
(
select /*+ no_merge no_eliminate_oby */
/* optionally use parallelism e.g. parallel(t, 2) */
rowid as row_id
[, t.*]
[, dbms_rowid.rowid_relative_fno(rowid) as file_number]
[, dbms_rowid.rowid_block_number(rowid) as block_number]
[, dbms_rowid.rowid_row_number(rowid) as row_number]
from
<tabname> [sample [block] (%perc)] t
where
<expr_1> is not null
[or <expr_n> is not null]
order by
<expr_1> [asc|desc]
[, <expr_n> [asc|desc]]
, rowid
);


and a first version of a ready-to-use script could look like this:


-- Simulate / calculate clustering factor
set verify off

set termout off

column ora10 new_value if_v10 noprint
column oralower10 new_value if_lower_v10 noprint

-- Determine version for regular expression support
select
decode(substr(banner, instr(banner, 'Release ') + 8, 1), '1', '', '--') as ora10
from
v$version
where
rownum = 1;

select
decode('&if_v10', '--', '', '--') as oralower10
from
dual;

set termout on

accept table_name prompt 'Enter table name: '

define sample_pct = 100

accept sample_pct number default &sample_pct prompt 'Enter sample percent (default &sample_pct): '

define p_degree = DEFAULT

accept p_degree default &p_degree prompt 'Enter parallel degree (default &p_degree): '

define history = 1

accept history number default &history prompt 'Enter number of blocks to remember (default &history): '

-- ideally comma separated without spaces surrounding the comma
accept col_list prompt 'Enter comma separated index column list: '

set termout off

column define_sample_block new_value sample_block noprint

select
case
when &sample_pct < 100
then 'sample block (&sample_pct)'
else ''
end as define_sample_block
from
dual;

column col_list_where_expr new_value col_list_where noprint

-- Try to get clever with the WHERE clause derived from the index expression
select
replace(
&if_v10 regexp_replace('&col_list', '( asc| desc)?( nulls (last|first))?', '')
&if_lower_v10 replace(replace(replace(replace(replace('&col_list', ' asc', ''), ' desc', ''), ' nulls', ''), ' last', ''), ' first', '')
, ','
, ' is not null or '
) || ' is not null' as col_list_where_expr
from
dual;

column p_degree_hint_expr new_value p_degree_hint noprint

select
case
when '&p_degree' != 'DEFAULT'
then 'parallel (t &p_degree)'
else ''
end as p_degree_hint_expr
from
dual;

-- set echo on verify on
set termout on

select
sys_op_countchg(substrb(row_id,1,15), &history)
* (100 / &sample_pct) as clustering_factor
, count(*)
* (100 / &sample_pct) as cnt
, count(distinct substrb(row_id,1,15))
* (100 / &sample_pct) as blocks
from
(
select /*+ no_merge no_eliminate_oby &p_degree_hint */
rowid as row_id
--, t.*
from
&table_name &sample_block t
where
&col_list_where
order by
&col_list
, rowid
);


It is based on the undocumented aggregate function SYS_OP_COUNTCHG that is apparently used by DBMS_STATS to calculate the clustering factor.

The second parameter to this function (I called it "history") is very interesting, since it represents the number of blocks the function "remembers" to determine if the block has "changed" or not. DBMS_STATS uses 1 as value and therefore if we have data that is still clustered but unluckily scattered across a few blocks it will lead to a likely non-representative clustering factor since walking the index may jump forth and back between these few blocks but the SYS_OP_COUNTCHG function will increase the clustering factor with each different block, although it stays within the same few blocks and therefore these blocks very likely will be held in the cache.

For example in case of concurrent inserts and ASSM or freelist / freelist groups with MSSM choosing an appropriate number of blocks to retain could be the number of concurrent processes that insert the data - more on this in the demonstration part later.

The remaining placeholders are straightforward - the table name obviously, and if it is a large table you can use the sample clause to avoid reading the whole table, but then the values returned need to be adjusted accordingly - you could also try to run a potentially required full table scan in parallel (something that the original DBMS_STATS query doesn't) - note however that the aggregate function will/needs to be performed by the query coordinator (due to the dependency of the clustering factor evaluation on the data order) which might represent the bottleneck in case of parallel execution.

If you want to get a feeling on how the data will be sorted according to the index definition you can use the [t.*] and [DBMS_ROWID...] clauses and execute only the inner query without the aggregate function - in this case a potentially required sort operation is going to be more costly due to the increased data volume to sort.

The expressions in the WHERE and ORDER BY clause are supposed to represent the columns and/or expressions (in case of function-based indexes) used in the index definition.

The WHERE clause will ensure that only data will be considered that leads to non-null expressions in the index (a b*tree index only covers non-null data), and the ORDER BY clause will order the data the way the index will be ordered.

Note that non-unique indexes will get the ROWID added to make the index expression unique - for unique indexes this is not required, but doesn't harm, since the expression by itself is already "unique". You can omit the ROWID in this case, but it won't change the outcome.

This query also allows some interesting considerations. For example if you know that you'll mostly access only a particular "hot" part of the table which is well clustered, but the remaining "cold(er)" part of the table is rather scattered (for example in case of batch inserts of newly arrived data into a partially deleted and shrunk table via the new SHRINK option introduced in 10g), the overall average clustering factor determined might be bad but probably not representative for a typical query accessing only the "hot"/"latest" data. You could then add the corresponding selection criteria to the query to restrict the data analysed accordingly and use the obtained clustering factor to correct the index statistics using DBMS_STATS.SET_INDEX_STATS.

Here is an demonstration of some of the common scenarios regarding the clustering factor. It allows to reproduce issues with concurrent inserts, extra columns and changed column order. It is a modification of some code I've recently used to reproduce similar issues that one of my clients had.


prompt Tablespace
accept tblspace
rem define tblspace = test_8k

drop table t1 purge;

drop sequence seq_t1_run_id;

drop sequence seq_t1_seq_id;

create table t1 (
run_id integer not null, /* identify the process inserting the data */
batch_id integer not null, /* represents clustered data, could also be a (arriving) date */
a_value number null, /* represents sequence based data */
a_random number null, /* represents randomly scattered data */
a_date timestamp default systimestamp not null, /* represents the insert timestamp */
filler char(1) default 'x' not null /* can be used to size the row as required */
)
tablespace &tblspace;

/* a sample index */
create index t1_idx1 on
t1 (
batch_id,
a_value
)
tablespace &tblspace;

create sequence seq_t1_run_id;

create sequence seq_t1_seq_id;

create or replace procedure populate_t1(i_run_id in integer, i_iter in integer) as
begin
dbms_output.put_line(
dbms_lock.request(
1
, dbms_lock.s_mode
, release_on_commit => true
)
);
commit;
for i in 1..i_iter loop
for j in 1..100 loop
insert into t1 (
run_id
, batch_id
, a_value
, a_random
)
values (
i_run_id
, i
, seq_t1_seq_id.nextval
, trunc(dbms_random.value(1, 1000))
);
commit;
dbms_lock.sleep(0.01);
end loop;
end loop;
end;
/


If you want to test the effect of concurrent inserts with ASSM for instance, choose an appropriate tablespace (or modify the script to use freelists / freelist groups with MSSM) and run the following code.

In a main session run this:


-- run this as main session
-- afterwards start as many of the below code snippets
-- and press return in the main session
-- to let them all start at the same time

truncate table t1;

begin
dbms_output.put_line(
dbms_lock.request(
1
, dbms_lock.x_mode
, release_on_commit=>true
)
);
end;
/

prompt Press return to continue when sessions have been started

accept x

commit;

prompt Press return to continue when sessions have completed

accept x

exec dbms_stats.gather_table_stats(null, 't1', estimate_percent=>null, cascade=>true)

select
index_name
, clustering_factor
from
user_indexes
where
table_name = 'T1';


Then start this in as many sessions as you want to run concurrently:


set timing on echo on

variable n_run_id number

exec select seq_t1_run_id.nextval into :n_run_id from dual;

exec populate_t1(:n_run_id, 100)

commit;


After the sessions have started they're going to wait on the lock of the main session. Press ENTER to get the sessions started and press ENTER again after the session have completed to gather statistics and get initial information about the CLUSTERING_FACTOR of the sample index determined by DBMS_STATS.

The number of iterations (set to 100 in the code snippet above) determines how long this code will run - every insert is delayed by 1/100th of a second - the minimum delay supported by DBMS_LOCK.SLEEP - so the block above will insert 10,000 rows lasting approx. 100 seconds.

The code uses the simple synchronisation method also used by Jonathan in his sample scripts (based on DBMS_LOCK.REQUEST) - the main session allocates a user lock in exclusive mode, all other sessions attempt to request this in shared mode. Therefore all sessions will wait until the main session commits to release the lock. Note that this uses a hard coded lock handle - in a non-test system it is advisable to use DBMS_LOCK.ALLOCATE_UNIQUE to generate a unique lock handle.

Depending on what you've chosen as concurrency and segment space managment, the clustering factor of the index on (batch_id, a_value) might be close to the number of blocks or rows in the table as determined by the final DBMS_STATS call.

You can use now the query to perform some analysis regarding the clustering factor. You could run e.g. the following query:


select
sys_op_countchg(substrb(row_id,1,15), 1) as clustering_factor
, count(*) as cnt
, count(distinct substrb(row_id,1,15)) as blocks
from
(
select /*+ no_merge no_eliminate_oby */
rowid as row_id
, t1.*
from
t1
where
batch_id is not null
or a_value is not null
order by
batch_id
, a_value
, rowid
);


This should give you exactly the clustering factor that has been determined by the DBMS_STATS call used above.

In my case when using MSSM I got a clustering factor of 190 with the table having 186 blocks for 40,000 rows (four concurrent processes each inserting 10,000 rows).

When using ASSM for the same setup (four processes) I got a clustering factor of 28,483 (!) for the same index. Note that the results might vary significantly, depending on how the processes were assigned to the different freelists (MSSM) or block groups (ASSM).

If you've used ASSM or freelist / freelist groups with MSSM then replace the "history" parameter with your number of concurrent processes (or number of freelists, if you had more processes than freelists), e.g. in case of four concurrent processes:


select
sys_op_countchg(substrb(row_id,1,15), 4) as clustering_factor
...


and you should notice a significant drop in the clustering factor, caused by the fact that multiple concurrent inserts used different blocks and therefore the data is not in a single block, but clustered in a few blocks and Oracle has in this constructed case to "jump" forth and back between these few blocks to obtain the data (actually caused by the A_VALUE column which is an increasing value, but written concurrently by the different processes, so with ASSM/freelist (groups) each increasing value is potentially stored in a different block).

In my particular case the simulated clustering factor for ASSM dropped from 28,483 to 188.

Some variations of the query allow to reproduce some other scenarios, e.g. use the following columns to see the impact of adding a badly scattered column to an index:


batch_id


(in my case clustering factor 188 for MSSM with default freelists)

vs.


batch_id
, a_random


(in my case clustering factor 22,191 for MSSM with default freelists, selecting a history size of 4 showed a clustering factor of 190)

or this one to see the impact of changing the column order:


batch_id
, a_value
, a_random
, rowid


(in my case clustering factor 190 for MSSM with default freelists)

vs.


a_random
, batch_id
, a_value
, rowid


(in my case clustering factor 35,904 for MSSM with default freelists, and here increasing the history size to 4 doesn't make a significant difference)

You could also use variations of the following query to get a feeling how the data arrived in the table:


select
rowid
, run_id
, batch_id
, a_value
, to_char(a_date, 'DD.MM.YYYY HH24:MI:SS.FF') as insert_timestamp
, dbms_rowid.rowid_relative_fno(rowid) as file_number
, dbms_rowid.rowid_block_number(rowid) as block_number
, dbms_rowid.rowid_row_number(rowid) as row_number
from
t1
order by
a_date;


By using different ORDER BYs (or no ORDER BY) for the above query you can get some other interesting insights how the data is stored in the table - in particular the difference when using multiple freelists or ASSM with concurrent inserts.

Happy simulating!

5 comments:

  1. Great post Randolf!

    I need to spend a little time on this one. I have in the past modified clustering factor to get Oracle to pick an index I wanted (because I knew it was "better" than the one being used based on testing). I never had time to go back and do an analysis as to why the clustering factor was high to begin with. I may go back a revisit that situation using your ideas as presented here.

    Kerry

    ReplyDelete
  2. Hi Randolf,

    Great post!!!

    The sql is working well for 10g, but when tried on 9.2.0.8 it shows the below error -

    select sys_op_countchg(substrb(row_id,1,15), 1)
    *
    ERROR at line 1:
    ORA-03113: end-of-file on communication channel

    ORA-07445: exception encountered: core dump [0000000102A26BF4] [SIGSEGV] [Address not mapped to object] [0x300000000] [] []

    Kindly suggest.

    Regards,
    Anand

    ReplyDelete
  3. Anand,

    I suggest you check the ORA-7445 lookup tool on Oracle Support with the call stack taken from the core dump generated.

    Since SYS_OP_COUNTCHG is an internal function it is probably not guaranteed to work on every Oracle release. I haven't tested it on pre-10g versions either.

    Randolf

    ReplyDelete
  4. Hi Randolf,
    Good post,I've the same problem about a big table on my production enviroment.
    I'm trying to calculate the parameter for concurrent process on my table to insert in sys_op_countchg function.
    In your demo you created a table, but in real case what can i do?
    I would monitoring my table to view how many concurrent processes exist.
    Thanks,
    Ciao

    Alberto

    ReplyDelete
  5. Hi Alberto,

    basically I've already tried to address that question by the last query provided in the blog post that you can use to get a feeling how the data arrived into the table.

    Randolf

    ReplyDelete