Oracle related stuff: May 2011

Sunday, May 29, 2011

Things worth to mention and remember (IV) - Data Loading

In this part of the series I'll cover some basics about data loading:

1. If you want to load a large amount of data quickly into a table or partition that gets truncated before the load and indexes also need to be maintained, then it is probably faster to set the the indexes to unusable before the load and rebuild them afterwards instead of letting the insert maintain the indexes. Note that even with a direct-path insert the index maintenance of usable indexes will generate undo and redo, whereas a separate index rebuild doesn't generate undo and can be also be run as a nologging operation if desired. However, as always, test for your particular situation/configuration as the index maintenance as part of direct-path inserts are quite efficient and therefore might not be that much slower than separate index rebuild steps.

However, there is a simple, yet important point to consider when attempting to load with unusable indexes:

A truncate makes indexes automatically usable

So the order of DDL statement execution before the load is relevant:

- ALTER INDEX ... UNUSABLE
- TRUNCATE TABLE

This way the data load will maintain the indexes since the TRUNCATE after the ALTER INDEX has set the indexes to USABLE again!

The probably intended order is:

- TRUNCATE TABLE
- ALTER INDEX ... UNUSABLE

The same applies to a data load into a partition of a table, only the syntax of the two commands is slightly different.

2. Note that you cannot load into a segment with a UNIQUE index defined that is in UNUSABLE state, you'll get an error "ORA-26026: unique index ... initially in unusable state" (when using a direct-path insert) or "ORA-01502: index ... or partition of such index is in unusable state" (when using conventional insert) even with SKIP_UNUSABLE_INDEXES set to TRUE (default from 10g on).

If you want to use an UNIQUE index that is not maintained during the data load you need to drop it and re-create after the load.

There is a however a way to circumvent this: You can support a UNIQUE or PRIMARY KEY constraint by means of a non-unique index. This way you can either set the constraint deferrable, or disable the constraint before the load while keeping the index in unusable state and rebuilding the index and re-enabling the constraint afterwards.

But you need to be aware of the following implications and side-effects of doing so. Note that depending on how you use the index and how your data manipulation patterns look like, they might not make any difference to your particular situation, but they can also have a very significant effect:

A non-unique index behaves differently from a unique index in several ways. Richard Foote has covered these differences very detailed already and since I probably couldn't say it any better I'm only going to mention here short recaps and corresponding links to his posts.

- It requires slightly more space for the same amount of data (one additional length byte, see Richard Foote)
- Depending on the data manipulation patterns it might require potentially significant more space as part of DML operations because the index entries can not be re-used within a transaction even when the same data gets re-inserted, see Richard Foote and again here (because the ROWID is part of the key to make the index expression unique and because the ROWID can not be re-used within a transaction because the row entries in the row directory of a block can not be re-used within a transaction, see Jonathan Lewis)
- For non-unique indexes only: If you insert multiple rows with the same index key expression depending on the ROWID there might also be differences in the efficiency of a potential index block split (50-50 vs. 90-10 (actually 99-1))
- At execution time an operation based on a INDEX UNIQUE SCAN is handled differently from an INDEX RANGE SCAN in terms of latching and optimizations of the consistent gets, so there might be measurable run-time differences (see Richard Foote). Note that this changes under certain circumstances from version 11g on, which I will cover in a separate blog series because it is a very interesting topic on its own.

More on the topic by Richard Foote.

Also there are a few oddities with this setup to be aware of:

- In 10.2 a unique/PK constraint supported by non-unique index disables direct-path inserts, see here for more details

- In 11.1 and 11.2 a unique/PK constraint supported by non-unique index allows direct-path inserts but unfortunately allows you to accidentally insert duplicates, see here for more details

Sunday, May 22, 2011

ASSM bug reprise - part 2

Introduction

In the first part of this post I've explained some of the details and underlying reasons of bug 6918210. The most important part of the bug is that it can only be hit if many row migrations happen during a single transaction. However, having excessive row migrations is usually a sign of poor design, so this point probably can't be stressed enough:

If you don't have excessive row migrations the bug can not become significant

Of course, there might be cases where you think you actually have a sound design but due to lack of information about the internal workings it might not be obvious that excessive row migrations could be caused by certain activities.

One popular feature that might cause such trouble is compression. The most important thing that you need to know about compression is this:

Compression and subsequent significant updates do not work very well together

The main reason for this is that Oracle stores any compressed data after an update as uncompressed (although it is still a "compressed" row/block format the data itself is stored uncompressed). Note that this also holds true for the "advanced" OLTP compression option, as we will see shortly.

Given this fact it might become obvious that updates to compressed data can cause excessive row migrations, because:

- The row data stored in uncompressed format will usually require a lot more space than the original compressed format, hence the row will no longer fit into the place of the block where it originally resided. It needs to be moved somewhere else within the same block (and the block might have to be "re-organized" in order to allow for sufficient contiguous space, which adds CPU overhead to the operation), and if there isn't sufficient space in the block available it will have to be migrated to a different block with sufficient space

- By default the "basic" compression option of Oracle implicitly sets the PCTFREE of the block to 0, however you can change this by explicitly defining a PCTFREE. The "OLTP" compression leaves PCTFREE at the default of 10

These PCTFREE settings, in particular the default of 0 used by "basic" compression, do not leave a lot of free space in the block for further row growth, so it becomes clear that without any non-default PCTFREE settings anything that performs updates to more than just a couple of rows per block of compressed data will lead to row migrations.

Of course you'll appreciate at least with "basic" compression that does not attempt to re-compress the blocks any application updating more than a couple of compressed rows must be called bad design, since it uses the feature in a way that it was not intended for.

Things look differently, from a general point of view at least, with "OLTP" compression since it promises to re-compress the data also during conventional DML and therefore should allow subsequent updates without suffering from too many or even excessive row migrations. Unfortunately, I couldn't confirm this in the tests that I've performed.

Basic Compression

So, let's start with a simple variation of the original test case, by introducing "basic" compression and have a look at the results.

By the way, all the tests have been performed using a 8K ASSM tablespace.

set echo on timing on

drop table t1;

purge table t1;

CREATE TABLE t1
(pkey varchar2(1),
 v1 varchar2(255),
 v2 varchar2(255)
)
 compress
 pctfree 0
 TABLESPACE &&tblspace;

INSERT /*+ append */ INTO t1
SELECT '1' as pkey,
       to_char((mod(rownum, 1) + 1), 'TM') || 'BL' AS v1,
       'BLUBB' /*null*/ AS v2
    FROM dual CONNECT BY LEVEL <= 50000
/

commit;

BEGIN dbms_stats.gather_table_stats(
        ownname => null,
        tabname => 'T1');
END;
/

create index t1_idx on t1 (substr(v1, 1, 1)) TABLESPACE &tblspace;

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

truncate table chained_rows;

analyze table t1 list chained rows;

select count(*) from chained_rows;

column file_no new_value file_no
column block_no new_value block_no

select
        dbms_rowid.rowid_relative_fno(rowid) as file_no
      , dbms_rowid.rowid_block_number(rowid) as block_no
from
        t1
where
        rownum <= 1;

alter session set tracefile_identifier = 'before_update';

alter system checkpoint;

alter system dump datafile &file_no block &block_no;

pause Press Enter to continue...

exec mystats_pkg.ms_start

-- Counters not updated in 11g
-- execute snap_kcbsw.start_snap

/*
alter session set tracefile_identifier = 'space_layer';

alter session set events '10320 trace name context forever, level 3';
alter session set events '10612 trace name context forever, level 1';
*/

UPDATE /*+ full(t1) */ t1 SET v2 = v1
where substr(v1, 1, 1) = '1';

/*
alter session set events '10320 trace name context off';
alter session set events '10612 trace name context off';
*/

commit;

set serveroutput on size 1000000 format wrapped
set linesize 120
set trimspool on

-- Counters not updated in 11g
-- execute snap_kcbsw.end_snap

exec mystats_pkg.ms_stop(1)

BEGIN dbms_stats.gather_table_stats(
        ownname => null,
        tabname => 'T1');
END;
/

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

truncate table chained_rows;

analyze table t1 list chained rows;

select count(*) from chained_rows;

alter session set tracefile_identifier = 'after_update';

alter system checkpoint;

alter system dump datafile &file_no block &block_no;

accept rdba prompt 'Enter forwarding ROWID found in block dump: '

column rdba new_value rdba

-- Remove any potential leading and trailing unnecessary stuff
select
        substr('&rdba',
               case
               when instr('&rdba', '0x') = 0
               then 1
               else instr('&rdba', '0x') + 2
               end,
               case
               when instr('&rdba', '.') = 0
               then 32767
               else instr('&rdba', '.') -
                 case
                 when instr('&rdba', '0x') = 0
                 then 0
                 else instr('&rdba', '0x') + 2
                 end
               end
              ) as rdba
from
        dual
;

select
        dbms_utility.data_block_address_file(to_number('&rdba', rpad('X', length('&rdba'), 'X'))) as file_no
      , dbms_utility.data_block_address_block(to_number('&rdba', rpad('X', length('&rdba'), 'X'))) as block_no
from
        dual
;

alter session set tracefile_identifier = 'migrated_rows';

alter system dump datafile &file_no block &block_no;

If you look closely at the example you'll notice that from an "innocent" application point of view the update will actually shorten the row in size - but of course only if you do not take into account the effects described above about compressed data.

This is what a sample block dump looks like right after the insert:

.
.
.
data_block_dump,data header at 0x551007c
===============
tsiz: 0x1f80
hsiz: 0x5d6
pbl: 0x0551007c
     76543210
flag=-0------
ntab=2
nrow=728
frre=-1
fsbo=0x5d6
fseo=0x113e
avsp=0xc
tosp=0xc
 r0_9ir2=0x0
 mec_kdbh9ir2=0x1
               76543210
 shcf_kdbh9ir2=----------
           76543210
 flag_9ir2=--R----C
  fcls_9ir2[4]={ 0 32768 32768 32768 }
0x1e:pti[0] nrow=1 offs=0
0x22:pti[1] nrow=727 offs=1
0x26:pri[0] offs=0x1f71
0x28:pri[1] offs=0x1f6c
0x2a:pri[2] offs=0x1f67
0x2c:pri[3] offs=0x1f62
.
.
.
block_row_dump:
tab 0, row 0, @0x1f71
tl: 15 fb: --H-FL-- lb: 0x0  cc: 3
col  0: [ 1]  31
col  1: [ 3]  31 42 4c
col  2: [ 5]  42 4c 55 42 42
bindmp: 02 d7 03 c9 31 cb 31 42 4c cd 42 4c 55 42 42
tab 1, row 0, @0x1f6c
tl: 5 fb: --H-FL-- lb: 0x0  cc: 3
col  0: [ 1]  31
col  1: [ 3]  31 42 4c
col  2: [ 5]  42 4c 55 42 42
bindmp: 2c 00 01 03 00
tab 1, row 1, @0x1f67
tl: 5 fb: --H-FL-- lb: 0x0  cc: 3
col  0: [ 1]  31
col  1: [ 3]  31 42 4c
col  2: [ 5]  42 4c 55 42 42
bindmp: 2c 00 01 03 00
.
.
.

You'll notice that the rows have been compressed very efficiently (see the "symbol table" as "tab 0") and are just reported to allocate 5 bytes each.

It is quite interesting to note that the rows are actually stored with no gap in between, without compression usually Oracle will allocate at least 9 bytes per row (the minimum row size required for migrated rows) even if the actual row is less than 9 bytes in length. I'm not sure why Oracle does this since it means even more work in case of row migrations. Since Oracle doesn't store more rows in the block than dictated by the minimum row size of 9 bytes even with compression enabled this effect is a bit puzzling. You can tell that from the block header - although there is still physically free space in the block (free space begin offset: fsbo=0x5d6, free space end offset: fseo=0x113e but available space: avsp=0xc) it is reported with almost no space available.

In contrast this is what a block looks like where uncompressed very small rows are stored:

.
.
.
data_block_dump,data header at 0x610a07c
===============
tsiz: 0x1f80
hsiz: 0x5c6
pbl: 0x0610a07c
     76543210
flag=--------
ntab=1
nrow=730
frre=-1
fsbo=0x5c6
fseo=0x5d6
avsp=0x10
tosp=0x10
0xe:pti[0] nrow=730 offs=0
0x12:pri[0] offs=0x1f7b
0x14:pri[1] offs=0x1f72
0x16:pri[2] offs=0x1f69
0x18:pri[3] offs=0x1f60
0x1a:pri[4] offs=0x1f57
.
.
.
block_row_dump:
tab 0, row 0, @0x1f7b
tl: 5 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 1]  31
tab 0, row 1, @0x1f72
tl: 5 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 1]  31
tab 0, row 2, @0x1f69
tl: 5 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 1]  31
tab 0, row 3, @0x1f60
tl: 5 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 1]  31
tab 0, row 4, @0x1f57
.
.
.

Although the rows effectively require less than 9 bytes the gap between each row is exactly 9 bytes and the available space corresponds to the free space offsets (fsbo=0x5c6, fseo=0x5d6 and avsp=0x10).

Back to our "basic" compression example: Of course we know that the update decompresses the affected rows which means that they will allocate more than these 5 bytes and hence eventually will cause row migrations. Since the decompressed row size is still very small and below the limits outlined in part 1 we'll run again into the basic bug - the blocks holding the migrated rows can take more rows than ITL entries are available leading to the "free space search" anomaly.

OLTP Compression

Let's change the script and use OLTP compression instead. Additionally we'll give Oracle a little room to manoeuvre so that it should be able to store an uncompressed row and re-compress it when required. I do this by creating and populating the table with PCTFREE 10 but switching to PCTFREE 0 right before the update. Since we do not expect that many row migrations with OLTP compression the bug shouldn't be a problem.

.
.
.
CREATE TABLE t1
(pkey varchar2(1),
 v1 varchar2(255),
 v2 varchar2(255)
)
 compress for all operations
 pctfree 10
 TABLESPACE &&tblspace;
.
.
.
alter table t1 pctfree 0;

UPDATE /*+ full(t1) */ t1 SET v2 = v1
where substr(v1, 1, 1) = '1';
.
.
.

So the expectation according to the general description of how OLTP compression works would be that whenever Oracle hits the PCTFREE limit it attempts to re-compress the block. Actually a more accurate description is that the compression is supposed to take place whenever Oracle re-organizes the block to coalesce free space in the block to maximize the contiguous chunks of available space. Oracle 10g added a statistic for this called "heap block compress", see e.g. Jonathan Lewis' blog post about the heap block compression.

But if you run this script in 11.1.0.7 base release, the results will not be as expected: The table has the same size afterwards as with "basic" compression, and we apparently still hit the bug - the update still takes millions of session logical reads. In fact, looking at the block dumps and statistics it looks like that no HSC compression has taken place at all, although we see a lot of occurrences of "heap block compress". You can confirm this by looking for statistics beginning with "HSC..." after the update - apart from the "HSC Heap/Compressed Segment Block Changes" no other HSC related statistics show up.

If you follow one of the migrated rows, you'll see that even the rows that have been migrated to a new block are not compressed (and not even stored in compressed block format) - so we hit again the same basic "no ITLs left but still free space" problem in the blocks holding the migrated rows.

If you run the same script in 11.2.0.1 or 11.2.0.2 the results look slightly different: The original blocks are still not re-compressed, so we seem to end up with the same number of migrated rows, but at least the blocks holding the migrated rows get compressed and therefore the resulting segment will be smaller in size. It also interesting to note that the number of ITL entries in compressed blocks seems to be limited to 20 - probably controlled by the parameter "_limit_itl" as Jonathan Lewis recently has found out.

OLTP Compression Block Re-Compression

I have to admit that I really had a hard time convincing Oracle in any of the versions tested to perform a re-compression of a block that wasn't manipulated by inserts (or inserts caused by row migrations). In fact this is the only circumstance where I could see the HSC compression effectively work. Most importantly it doesn't seem to work in my cases when updating rows that grow in size - leading exactly to our problem: Excessive row migrations when updating many rows.

I've tried different variations of DML patterns (small / large transactions, mixtures of insert, deletes, updates etc. among those the most straightforward one - updating a column to the same value which should make re-compression a "no-brainer"), and to me it looks like that OLTP re-compression attempts are only triggered by inserts into a block, updates never seem to trigger a re-compression.

Oracle performs a row migration rather than attempting to re-compress the block. It might be a deliberate design decision since the OLTP compression, apart from the obvious CPU overhead, also seems to be write a full pre-compression block image to undo/redo, since the changes done to the block by the compression apparently can not be represented by discrete change vectors. Therefore minimizing the compression attempts by simply migrating rows when necessary could be a deliberate choice - the undo/redo overhead of the OLTP compression can be significant.

Also, since it is called "OLTP" compression, one could assume that a OLTP workload consists of a general mixture of inserts, updates and deletes and therefore not compressing on updates shouldn't be too relevant. This seems also to be confirmed when reading this section of the "Concepts" guide which could be interpreted that only inserts trigger a re-compression of a block.

ASSM Bug variation #1

Of course, things could be worse, and that's what exactly is going to happen with OLTP compression in 11.1.0.7 base release: Increasing the row length or PCTFREE should prevent the basic bug from happening even with excessive row migrations. So let's repeat the test case, this time using the trick we used in part 1 to prevent the bug: Setting PCTFREE at 50 right before the update - this will mark the blocks with the migrated rows as full long before the ITL slots can become depleted.

.
.
.
alter table t1 pctfree 50;

UPDATE /*+ full(t1) */ t1 SET v2 = v1
where substr(v1, 1, 1) = '1';
.
.
.

But look at the results:

.
.
.
50000 rows updated.

Elapsed: 00:00:24.16.
.
.
STAT..session logical reads                                          9,641,591
.
.
.

Even worse than before - even longer runtime, even more "session logical reads" and more CPU consumption.

Looking at the blocks holding the migrated rows we can clearly tell that they don't meet the prerequisites for the basic "ITL is full but free space left" problem - so something else must be going on leading to similar symptoms.

.
.
.
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0002.01b.00000767  0x00c015bc.0298.21  --U-   89  fsc 0x0000.002641bf
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
.
.
.
0x5a   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x5b   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
bdba: 0x024000ea
data_block_dump,data header at 0x2ea08bc
===============
tsiz: 0x1740
hsiz: 0xc4
pbl: 0x02ea08bc
     76543210
flag=--------
ntab=1
nrow=89
frre=-1
fsbo=0xc4
fseo=0x10a5
avsp=0xfe1
tosp=0xfe1
0xe:pti[0] nrow=89 offs=0
0x12:pri[0] offs=0x172d
0x14:pri[1] offs=0x171a
0x16:pri[2] offs=0x1707
.
.
.

When enabling the debug events 10320 and 10612 you'll notice that there is a similar pattern of "...didnot like..." messages as with the basic bug, but in addition there are other entries that don't show up in the basic case: "Compressable Block...". To me it looks like Oracle actually considers these blocks as candidates for further inserts if they were compressed, but somehow this compression never happens, so the blocks are left in that state and are considered and rejected for inserts over and over again leading to the same symptoms as for the basic bug, but obviously for different reasons.

.
.
.
KDTGSP: seg:0x2400083 mark_full:0 pdba:0x02400085
kdtgsp: calling ktspgsp_cbk w/ kdt_bseg_srch_cbk() on 73997
Enter cbk---------------
[ktspisc]: Rejected: BlockDBA:0x02400085
ktspfsrch: Returns: BlockDBA:0x024000e8
kdt_bseg_srch_cbk: examine dba=10.0x024000e8
kdt_bseg_srch_cbk:Compressable Block dba=10.0x024000e8 avs=832 afs=0 tosp=832 full=0
kdt_bseg_srch_cbk: failed dba=10.0x024000e8 avs=832 afs=0 tosp=832 full=0
ktspfsrch:Cbk didnot like 0x024000e8
ktspfsrch: Returns: BlockDBA:0x024000eb
kdt_bseg_srch_cbk: examine dba=10.0x024000eb
kdt_bseg_srch_cbk:Compressable Block dba=10.0x024000eb avs=832 afs=0 tosp=832 full=0
kdt_bseg_srch_cbk: failed dba=10.0x024000eb avs=832 afs=0 tosp=832 full=0
ktspfsrch:Cbk didnot like 0x024000eb
ktspfsrch: Returns: BlockDBA:0x024000ee
kdt_bseg_srch_cbk: examine dba=10.0x024000ee
kdt_bseg_srch_cbk:Compressable Block dba=10.0x024000ee avs=832 afs=0 tosp=832 full=0
kdt_bseg_srch_cbk: failed dba=10.0x024000ee avs=832 afs=0 tosp=832 full=0
ktspfsrch:Cbk didnot like 0x024000ee
ktspfsrch: Returns: BlockDBA:0x024000f1
kdt_bseg_srch_cbk: examine dba=10.0x024000f1
kdt_bseg_srch_cbk: found dba=10.0x024000f1 avs=3511 afs=0 tosp=3511 full=1
Exit cbk ------
 ndba:0x024000f1
.
.
.

This assumption is actually confirmed in various bug descriptions on My Oracle Support. In particular bugs 9341448, 6009614, 9667930, 9708484 and 9667930 seem to apply. Note that there seems to be another related bug 8287680 for inserts. Also document 1101900.1 applies. According to these bug descriptions this variation of the bug is actually fixed in some patch sets (as opposed to the "basic" bug in part 1 that is only fixed in the 11.2 release) - and also a one-off patch (9667930) seems to be available.

It is also interesting to note that this update anomaly is also documented in the "Master Note for OLTP Compression [ID 1223705.1]" on My Oracle Support - search for "Test #5" and review the runtimes mentioned in the table for that test, Scenario #3 / #5.

Note that this problem does not show up if the test case gets repeated with basic compression instead - it will not attempt to re-compress the blocks and therefore can only be affected by the basic variation of the bug.

ASSM Bug variation #2

But, it is not over yet - finally we come to the oddest variation of the bug in 11.1.0.7 base release: Combining basic compression with partitioning can lead to the same symptoms as with the just outlined "re-compression" bug with OLTP compression. Oracle apparently sometimes (it is not always reproducible) starts to consider blocks for re-compression with the same dire results although basic compression gets used.

Again a slight variation of the script gets used: I'll use basic compression with PCTFREE 10 but introduce a very simple partitioning schema. Also the V2 column gets updated to the same value - all this is done to prevent running into the basic bug because the migrated rows will be too large for that.

set echo on timing on

drop table t1;

purge table t1;

CREATE TABLE t1
(pkey varchar2(1),
 v1 varchar2(255),
 v2 varchar2(255)
)
partition by range(pkey)
(
 partition p1 values less than (2),
 partition p2 values less than (3)
)
compress
pctfree 10
TABLESPACE &&tblspace;

INSERT /*+ append */ INTO t1
SELECT '1' as pkey,
       to_char((mod(rownum, 1) + 1), 'TM') || 'BL' AS v1,
       'BLUBB' /*null*/ AS v2
    FROM dual CONNECT BY LEVEL <= 50000
/

commit;

BEGIN dbms_stats.gather_table_stats(
        ownname => null,
        tabname => 'T1');
END;
/

create index t1_idx on t1 (substr(v1, 1, 1)) TABLESPACE &tblspace;

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

truncate table chained_rows;

analyze table t1 list chained rows;

select count(*) from chained_rows;

column file_no new_value file_no
column block_no new_value block_no

select
        dbms_rowid.rowid_relative_fno(rowid) as file_no
      , dbms_rowid.rowid_block_number(rowid) as block_no
from
        t1
where
        rownum <= 1;

alter session set tracefile_identifier = 'before_update';

alter system checkpoint;

alter system dump datafile &file_no block &block_no;

pause Press Enter to continue...

exec mystats_pkg.ms_start

-- Counters not updated in 11g
-- execute snap_kcbsw.start_snap

/* This will generate a huge tracefile
alter session set tracefile_identifier = 'space_layer';

alter session set events '10320 trace name context forever, level 3';
alter session set events '10612 trace name context forever, level 1';
*/

-- Uncomment this to prevent the bug
-- alter system flush shared_pool;

UPDATE /*+ full(t1) */ t1 SET v2 = v2
where substr(v1, 1, 1) = '1';

/*
alter session set events '10320 trace name context off';
alter session set events '10612 trace name context off';
*/

commit;

set serveroutput on size 1000000 format wrapped
set linesize 120
set trimspool on

-- Counters not updated in 11g
-- execute snap_kcbsw.end_snap

exec mystats_pkg.ms_stop(1)

BEGIN dbms_stats.gather_table_stats(
        ownname => null,
        tabname => 'T1');
END;
/

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

truncate table chained_rows;

analyze table t1 list chained rows;

select count(*) from chained_rows;

alter session set tracefile_identifier = 'after_update';

alter system checkpoint;

alter system dump datafile &file_no block &block_no;

accept rdba prompt 'Enter forwarding ROWID found in block dump: '

column rdba new_value rdba

-- Remove any potential leading and trailing unnecessary stuff
select
        substr('&rdba',
               case
               when instr('&rdba', '0x') = 0
               then 1
               else instr('&rdba', '0x') + 2
               end,
               case
               when instr('&rdba', '.') = 0
               then 32767
               else instr('&rdba', '.') -
                 case
                 when instr('&rdba', '0x') = 0
                 then 0
                 else instr('&rdba', '0x') + 2
                 end
               end
              ) as rdba
from
        dual
;

select
        dbms_utility.data_block_address_file(to_number('&rdba', rpad('X', length('&rdba'), 'X'))) as file_no
      , dbms_utility.data_block_address_block(to_number('&rdba', rpad('X', length('&rdba'), 'X'))) as block_no
from
        dual
;

alter session set tracefile_identifier = 'migrated_rows';

alter system dump datafile &file_no block &block_no;

Now if you run this script under 11.1.0.7 (base release) for several times, you might be able to spot similar symptoms: The updates takes suddenly very long with the same diagnosis output - in particular the 10320 and 10612 debug events show the same "Compressable Block" lines. Although we don't use advanced compression!

Interestingly the problem in this case can be prevented by invalidating information that is cached in the Shared Pool: ALTERing the table right before the update or brute force flushing the Shared Pool will prevent the problem from showing up. By clearing this information the confusion about re-compressing the blocks seems not to pop up. I have to admit however that I don't have a sound explanation why this invalidation of Shared Pool contents prevents this variation of the bug.

Summary

- Mixing compression with a significant number of updates is a bad idea in general

- OLTP block re-compression seems only to be triggered with inserts - updates will lead to row migrations rather than block re-compression

- Due to that behaviour you can end up with more row migrations than assumed even with OLTP compression

- In 11.1.0.7 (base release) something is going wrong with the re-compression attempts of OLTP compression leading to dire results when row migrations happen and an ASSM tablespace gets used. This is a documented bug, a good starting point is document 1101900.1.

- Using basic compression with partitioning can sometimes lead to the same symptoms as described in those bugs for OLTP compression

You should be very careful when using OLTP compression in release 11.1 and monitor the row migration rate - the database might have to work much harder than expected. You should certainly consider applying the one-off patch 9667930 or a patch set that contains the mentioned bug fixes if you plan to use OLTP compression in 11.1 (and ASSM).

Note that I could not reproduce any of these oddities in 11.2 - apart from the fact that updates never seem to trigger a re-compression. However, some of the bug descriptions found seem to suggest that 11.2 could be affected by some of the mentioned bugs as well.

Monday, May 16, 2011

ASSM bug reprise - part 1

This was meant to be published shortly after my latest quiz night post as an explanatory follow up, but unfortunately I only managed to complete this note by now.

There is a more or less famous bug in ASSM (see bug 6918210 in MOS as well as Greg Rahn's and Jonathan Lewis' post) in versions below 11.2 that so far has been classified as only showing up in case of a combination of larger block sizes (greater the current default of 8K) and excessive row migrations. With such a combination it was reproducible that an UPDATE of the same data pattern residing in an ASSM tablespace caused significantly more work than doing the same in a MSSM tablespace, because apparently ASSM had problems finding suitable blocks to store the migrated rows.

In this post I'll show some diagnosis output of the bug and possible explanations what seems to go wrong under the covers.

Although the bug has been identified a long time ago it is still relevant, because:

- The bugfix is only included in the 11.2 versions. For all other versions supporting ASSM there is no one-off patch available, and it is also not fixed in any of the available patch sets, for example the quite recent 10.2.0.5 patch set still allows to reproduce the issue. This means that obviously a significant code change has been introduced in 11.2 that could not be made available as patch in the previous releases. It also means that as long as you're not yet on 11.2 it is possible to run into the bug

- The bug can show up with any block size as I'll demonstrate shortly, so it is not limited to larger block sizes as outlined in the original descriptions

- In 11.1 in combination with other features like basic or advanced (OLTP, for all operations) compression a variation of the bug is possible to show up even when not meeting the criteria for the original basic bug. In particular the OLTP compression introduced in Oracle 11.1 (and only in 11.1, since in 11.2 the ASSM bug is fixed) can exhibit the bug even in not so obvious cases where the pattern of modification would not necessarily suggest a row migration. This will be covered in a follow-up post

Here is a short demo that allows to reproduce the bug in any block size just by using a very small row size and a PCTFREE of 0. Of course you'll need to use an ASSM tablespace to reproduce the issue. If you use a MSSM tablespace, you'll notice a significant difference in the amount of work performed.

Note that the script uses Adrian Billington's "MY_STATS" package which can be downloaded for free here. It also uses Jonathan Lewis' logical I/O analysis snapper script, which is available from here. Note that the latter script does not give meaningful results from 11.1 on. From 11.1 on you need to use Tanel Poder's "buffer gets profiler" which works a bit differently but allows to show similar information about buffer gets. It is called "bufprof.sql" and is part of Tanel's awesome "tpt_public" scripts collection which can be found here.

set echo on timing on

drop table t1;

purge table t1;

CREATE TABLE t1
(pkey varchar2(1),
 v1 varchar2(255),
 v2 varchar2(255)
)
 nocompress
 pctfree 0
 TABLESPACE &&tblspace;

-- Modify below MOD(ROWNUM, 1) to control the
-- the number of rows updated
-- MOD(ROWNUM, 1) will update every row
-- MOD(ROWNUM, 2) will update every second row etc.
INSERT /*+ append */ INTO t1
SELECT '1' as pkey,
       to_char((mod(rownum, 1) + 1), 'TM') || 'BL' AS v1,
       null AS v2
    FROM dual CONNECT BY LEVEL <= 50000
/

commit;

BEGIN dbms_stats.gather_table_stats(
        ownname => null,
        tabname => 'T1');
END;
/

create index t1_idx on t1 (substr(v1, 1, 1)) TABLESPACE &tblspace;

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

truncate table chained_rows;

analyze table t1 list chained rows;

select count(*) from chained_rows;

-- This is only supported from 11.2 on
/*
select
        count(*)
      , sys.dbms_compression.get_compression_type(user, 'T1', rowid) as comp_type
from
        t1
group by
        sys.dbms_compression.get_compression_type(user, 'T1', rowid);
*/

column file_no new_value file_no
column block_no new_value block_no

select
        dbms_rowid.rowid_relative_fno(rowid) as file_no
      , dbms_rowid.rowid_block_number(rowid) as block_no
from
        t1
where
        rownum <= 1;

alter session set tracefile_identifier = 'before_update';

-- In 11.2 the block dumps are only showing disk content
-- alter system checkpoint;

alter system dump datafile &file_no block &block_no;

pause Press Enter to continue...

exec mystats_pkg.ms_start

execute snap_kcbsw.start_snap

/* Enabling this will generate a huge tracefile
alter session set tracefile_identifier = 'space_layer';

alter session set events '10320 trace name context forever, level 3';
alter session set events '10612 trace name context forever, level 1';
*/

-- Uncomment this to prevent the bug
-- alter table t1 pctfree 50;

UPDATE /*+ full(t1) */ t1 SET v2 = v1
where substr(v1, 1, 1) = '1';

commit;

set serveroutput on size 1000000 format wrapped
set linesize 120
set trimspool on

execute snap_kcbsw.end_snap

exec mystats_pkg.ms_stop(1)

BEGIN dbms_stats.gather_table_stats(
        ownname => null,
        tabname => 'T1');
END;
/

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

truncate table chained_rows;

analyze table t1 list chained rows;

select count(*) from chained_rows;

-- This is only supported from 11.2 on
/*
select
        count(*)
      , sys.dbms_compression.get_compression_type(user, 'T1', rowid) as comp_type
from
        t1
group by
        sys.dbms_compression.get_compression_type(user, 'T1', rowid);
*/

alter session set tracefile_identifier = 'after_update';

-- In 11.2 the block dumps are only showing disk content
-- alter system checkpoint;

alter system dump datafile &file_no block &block_no;

accept rdba prompt 'Enter forwarding ROWID found in block dump: '

column rdba new_value rdba

-- Remove any potential leading and trailing unnecessary stuff
select
        substr('&rdba',
               case
               when instr('&rdba', '0x') = 0
               then 1
               else instr('&rdba', '0x') + 2
               end,
               case
               when instr('&rdba', '.') = 0
               then 32767
               else instr('&rdba', '.') -
                 case
                 when instr('&rdba', '0x') = 0
                 then 0
                 else instr('&rdba', '0x') + 2
                 end
               end
              ) as rdba
from
        dual
;

select
        dbms_utility.data_block_address_file(to_number('&rdba', rpad('X', length('&rdba'), 'X'))) as file_no
      , dbms_utility.data_block_address_block(to_number('&rdba', rpad('X', length('&rdba'), 'X'))) as block_no
from
        dual
;

alter session set tracefile_identifier = 'migrated_rows';

-- In 11.2 the block dumps are only showing disk content
-- alter system checkpoint;

alter system dump datafile &file_no block &block_no;

If the bug is hit, then the UPDATE will take several seconds and consume mostly CPU. The statistics will show millions of "session logical reads" for the update of the 50,000 rows.

Up to version 10.2 the output from Jonathan's script will show something similar to this:

SQL> execute snap_kcbsw.end_snap
---------------------------------
Buffer Cache - 15-May 13:11:55
Interval:-  16 seconds
---------------------------------
          Why0          Why1          Why2    Other Wait
          ----          ----          ----    ----------
          .
          .
          .
     1,037,665             0             0             0 ktspfwh10: ktspscan_bmb
          .
          .
          .
     4,813,907             0             0             0 ktspbwh1: ktspfsrch
          .
          .
          .
          ----          ----          ----    ----------
     6,200,305             0             0             0 Total: 61 rows

Running a stack profile while the update is executing using e.g. Tanel's OSTACKPROF script will show an output similar to this:

Below is the stack prefix common to all samples:
------------------------------------------------------------------------
Frame->function()
------------------------------------------------------------------------
# 43 ->000000007702652D
# 42  ->OracleThreadStart()
# 41   ->opimai()
# 40    ->opimai_real()
# 39     ->sou2o()
# 38      ->opidrv()
# 37       ->opiodr()
# 36        ->opiino()
# 35         ->opitsk()
# 34          ->ttcpip()
# 33           ->opiodr()
# 32            ->kpoal8()
# 31             ->opiexe()
# 30              ->updexe()
# 29               ->updThreePhaseExe()
# 28                ->updaul()
# 27                 ->qerupFetch()
# 26                  ->updrow()
# 25                   ->kauupd()
# 24                    ->kdusru()
# 23                     ->kduurp()
# 22                      ->kdumrp()
# 21                       ->kdtInsRow()
# 20                        ->kdtgrs()
# 19                         ->kdtgsp()
# 18                          ->ktspgsp_main()
# 17                           ->ktspscan_bmb()
# 16                            ->ktspfsrch()
#  ...(see call profile below)
#
.
.
.

Tanel's Buffer Gets Profiler (old V1.02 version, the current V1.03 version crashed my 11.1.0.7 SYSDBA session) shows an output similar to this:

       SID   KCBBFWHY KCBBFSO_FLG TO_CHAR(K    KCBBFCM KCBBFSO_OWN      DEC KCBWHDES                       TOTAL_SAMPLES ACTIVE_PCT
---------- ---------- ----------- --------- ---------- ---------------- --- ------------------------------ ------------- ----------
       141          0           1    100001          0 000007FF1A3FF8D0 CR  kdswh11: kdst_fetch               1000       3160
       141          0           1    100001          0 000007FF1E1C7F18 CR  kdswh11: kdst_fetch               1000       3160
       141          0           1         0          0 000007FF1E1C7F18 CUR ktspbwh1: ktspfsrch                942       2976.72
       141          0           1   4000000          2 000007FF1E1C7F18 CUR ktspbwh1: ktspfsrch                 41       129.56
       141          0           1         0          0 000007FF1E1C7F18 CUR ktspfwh10: ktspscan_bmb             11       34.76
       141          0           1         0          0 000007FF1E1C9EE8 CR  kdswh11: kdst_fetch                  2       6.32
       141          0           1      1000 -2.147E+09 000007FF1E1C9EE8 CUR ktuwh02: ktugus                      1       3.16
       141          0           1         0          2 000007FF1E1C7F18 CUR kduwh01: kdusru                      1       3.16
       141          0           1    100001          0 000007FF1E1C9EE8 CR  kdswh11: kdst_fetch                  1       3.16
       141          0           1         0          2 000007FF1E1C7F18 CUR ktspbwh1: ktspfsrch                  1       3.16

So the function names "ktspfsrch" and "ktspscan_bmb" are common to all diagnosis output. They belong to the internal "kts" space management layer and seem to suggest that Oracle spends most of the time searching for "free space" where to put the migrated row.

So the question is why does Oracle apparently spend so much time searching for a suitable block when migrating a row? A look into the block dump that you get by entering the forwarding ROWID found in the original block after the update (search for "nrid" in the block dump and paste the hex representation of the ROWID at the prompt shown) reveals that Oracle has used up all available ITL slots in the block (for each row migrated into a block a new ITL is allocated) but there is still free space left in the block:

.
.
.
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.02f.0000045c  0x008011f7.03d6.41  --U-  167  fsc 0x0000.00335363
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
.
.
.
0xa7   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0xa8   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0xa9   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000

data_block_dump,data header at 0x11a5540c
===============
tsiz: 0xff0
hsiz: 0x160
pbl: 0x11a5540c
bdba: 0x0240015a
     76543210
flag=--------
ntab=1
nrow=167
frre=-1
fsbo=0x160
fseo=0x38b
avsp=0x22b
tosp=0x22b
.
.
.

Above example is taken from an 8K block. Depending on the block size there is a hard limit on the number of ITL slots since Oracle seems to limit the space allocated by the ITL entries to approx. 50% of the block or 255 entries, whatever is reached first. So we get the following maximum number of ITL slots per block size:

2K: 41 (* 24 bytes = 984 bytes)
4K: 83 (* 24 bytes = 1992 bytes)
8K: 169 (* 24 bytes = 4056 bytes)
16K: 255 (* 24 bytes = 6120 bytes)
32K: 255 (* 24 bytes = 6120 bytes)

As you can see from above, the hard limit of 169 as defined for 8K blocks has been reached (0xA9 = 169), but from a free space "perspective" the block is not full yet (avsp=0x22b, given a PCTFREE of 0).

The basic problem of the bug seems to be that blocks in such a state are still considered as possible candidates for inserts of migrated rows, but have then to be rejected as Oracle recognizes that no further ITL entries can be added. The more blocks get into this state, the harder Oracle has to work to find a block that really can be used for storing the migrated row.

This can also be confirmed by running a trace with event 10320 (Enable data layer (kdtgrs) tracing of space management calls) and 10612 (Prints debug information for auto-space managed segments) enabled. This will generate a huge trace file with contents like the following:

.
.
.
kdt_bseg_srch_cbk: examine dba=10.0x024000e0
kdt_bseg_srch_cbk: failed dba=10.0x024000e0 avs=0 afs=5274 tosp=555 full=1
ktspfsrch:Cbk didnot like 0x024000e0
ktspfsrch: Returns: BlockDBA:0x024000e4
kdt_bseg_srch_cbk: examine dba=10.0x024000e4
kdt_bseg_srch_cbk: failed dba=10.0x024000e4 avs=0 afs=5274 tosp=555 full=1
ktspfsrch:Cbk didnot like 0x024000e4
ktspfsrch: Returns: BlockDBA:0x024000e8
kdt_bseg_srch_cbk: examine dba=10.0x024000e8
kdt_bseg_srch_cbk: failed dba=10.0x024000e8 avs=0 afs=5274 tosp=555 full=1
ktspfsrch:Cbk didnot like 0x024000e8
ktspfsrch: Returns: BlockDBA:0x024000ec
kdt_bseg_srch_cbk: examine dba=10.0x024000ec
kdt_bseg_srch_cbk: found dba=10.0x024000ec avs=7215 afs=0 tosp=7215 full=1
.
.
.

The longer the UPDATE executes, the more "...didnot like..." messages will appear per row to migrate.

Since the problem also shows up when performing for example single-row updates, it can be seen that the number of "db block gets" performed per row increases with the number blocks that are filled with migrated rows and therefore are checked over and over again.

So the question of hitting the bug or not simply depends on the number of rows that can be migrated into a single block before the block is full and no longer considered for inserts. If the number of migrated rows that fit into the block is larger than the maximum number of ITL entries, the problem will show up when using a combination of an ASSM tablespace and a pre-11.2 version.

Of course from the figures above it becomes obvious that the problem can arise only if a migrated row is very small: Up to the 8K block size it needs a migrated row size of less than approx. 23 bytes (excluding the two bytes for the row directory entry), but only if the PCTFREE is set to 0. With larger PCTFREE settings the migrated row needs to be even smaller which is very unlikely since each migrated row already requires to hold the "head ROWID" that takes up 6 bytes, plus the 3 bytes for the column count, lock and flag byte. This means just 13 bytes left for data, which again need to hold a column length byte for every column stored.

This explains also why it is more likely to hit the problem with block sizes larger than 8K since there the maximum number of ITL slots (255) allocate less than 50% of the block and therefore the migrated rows can be larger and yet more than 255 might fit into a single block.

Given this information there are some interesting variations of the test case possible. If you for example increase the PCTFREE of the table right before the UPDATE (by uncommenting the "alter table t1 pctfree 50", depending on the block size a much lower PCTFREE setting will be sufficient to prevent the bug), you still will end up with the same number of rows migrated but the bug doesn't show up simply because less migrated rows fit into each block and therefore the maximum number of ITL slots won't be reached before the block fills up.

Summary

The purpose of this post was to explain the basic underlying principles of the bug which therefore can happen with any block size.

Of course you might now think "so, what?" since it needs quite an exotic setup to reproduce the issue in a default 8K block size (very small rows, PCTFREE close to 0, excessive row migrations) or larger non-default block sizes (16K / 32K) and still quite small migrated rows.

Up to 10.2 you're probably right - that's the end of the story there. However in an upcoming post I'll demonstrate that in 11.1 when throwing other features into the mix like compression and partitioning a nasty variation of the bug can become much more relevant to practical, real-life cases.

Oracle related stuff

Pages

Sunday, May 29, 2011

Things worth to mention and remember (IV) - Data Loading

Sunday, May 22, 2011

ASSM bug reprise - part 2

Monday, May 16, 2011

ASSM bug reprise - part 1

Blog Archive

Homepage

YouTube Channel

About Me

Contact Details

OakTable Book "Expert Oracle Practices"

Blogroll