Monday, May 16, 2011

ASSM bug reprise - part 1

This was meant to be published shortly after my latest quiz night post as an explanatory follow up, but unfortunately I only managed to complete this note by now.

There is a more or less famous bug in ASSM (see bug 6918210 in MOS as well as Greg Rahn's and Jonathan Lewis' post) in versions below 11.2 that so far has been classified as only showing up in case of a combination of larger block sizes (greater the current default of 8K) and excessive row migrations. With such a combination it was reproducible that an UPDATE of the same data pattern residing in an ASSM tablespace caused significantly more work than doing the same in a MSSM tablespace, because apparently ASSM had problems finding suitable blocks to store the migrated rows.

In this post I'll show some diagnosis output of the bug and possible explanations what seems to go wrong under the covers.

Although the bug has been identified a long time ago it is still relevant, because:

- The bugfix is only included in the 11.2 versions. For all other versions supporting ASSM there is no one-off patch available, and it is also not fixed in any of the available patch sets, for example the quite recent 10.2.0.5 patch set still allows to reproduce the issue. This means that obviously a significant code change has been introduced in 11.2 that could not be made available as patch in the previous releases. It also means that as long as you're not yet on 11.2 it is possible to run into the bug

- The bug can show up with any block size as I'll demonstrate shortly, so it is not limited to larger block sizes as outlined in the original descriptions

- In 11.1 in combination with other features like basic or advanced (OLTP, for all operations) compression a variation of the bug is possible to show up even when not meeting the criteria for the original basic bug. In particular the OLTP compression introduced in Oracle 11.1 (and only in 11.1, since in 11.2 the ASSM bug is fixed) can exhibit the bug even in not so obvious cases where the pattern of modification would not necessarily suggest a row migration. This will be covered in a follow-up post

Here is a short demo that allows to reproduce the bug in any block size just by using a very small row size and a PCTFREE of 0. Of course you'll need to use an ASSM tablespace to reproduce the issue. If you use a MSSM tablespace, you'll notice a significant difference in the amount of work performed.

Note that the script uses Adrian Billington's "MY_STATS" package which can be downloaded for free here. It also uses Jonathan Lewis' logical I/O analysis snapper script, which is available from here. Note that the latter script does not give meaningful results from 11.1 on. From 11.1 on you need to use Tanel Poder's "buffer gets profiler" which works a bit differently but allows to show similar information about buffer gets. It is called "bufprof.sql" and is part of Tanel's awesome "tpt_public" scripts collection which can be found here.


set echo on timing on

drop table t1;

purge table t1;

CREATE TABLE t1
(pkey varchar2(1),
v1 varchar2(255),
v2 varchar2(255)
)
nocompress
pctfree 0
TABLESPACE &&tblspace;

-- Modify below MOD(ROWNUM, 1) to control the
-- the number of rows updated
-- MOD(ROWNUM, 1) will update every row
-- MOD(ROWNUM, 2) will update every second row etc.
INSERT /*+ append */ INTO t1
SELECT '1' as pkey,
to_char((mod(rownum, 1) + 1), 'TM') || 'BL' AS v1,
null AS v2
FROM dual CONNECT BY LEVEL <= 50000
/

commit;

BEGIN dbms_stats.gather_table_stats(
ownname => null,
tabname => 'T1');
END;
/

create index t1_idx on t1 (substr(v1, 1, 1)) TABLESPACE &tblspace;

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

truncate table chained_rows;

analyze table t1 list chained rows;

select count(*) from chained_rows;

-- This is only supported from 11.2 on
/*
select
count(*)
, sys.dbms_compression.get_compression_type(user, 'T1', rowid) as comp_type
from
t1
group by
sys.dbms_compression.get_compression_type(user, 'T1', rowid);
*/

column file_no new_value file_no
column block_no new_value block_no

select
dbms_rowid.rowid_relative_fno(rowid) as file_no
, dbms_rowid.rowid_block_number(rowid) as block_no
from
t1
where
rownum <= 1;

alter session set tracefile_identifier = 'before_update';

-- In 11.2 the block dumps are only showing disk content
-- alter system checkpoint;

alter system dump datafile &file_no block &block_no;

pause Press Enter to continue...

exec mystats_pkg.ms_start

execute snap_kcbsw.start_snap

/* Enabling this will generate a huge tracefile
alter session set tracefile_identifier = 'space_layer';

alter session set events '10320 trace name context forever, level 3';
alter session set events '10612 trace name context forever, level 1';
*/

-- Uncomment this to prevent the bug
-- alter table t1 pctfree 50;

UPDATE /*+ full(t1) */ t1 SET v2 = v1
where substr(v1, 1, 1) = '1';

commit;

set serveroutput on size 1000000 format wrapped
set linesize 120
set trimspool on

execute snap_kcbsw.end_snap

exec mystats_pkg.ms_stop(1)

BEGIN dbms_stats.gather_table_stats(
ownname => null,
tabname => 'T1');
END;
/

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

truncate table chained_rows;

analyze table t1 list chained rows;

select count(*) from chained_rows;

-- This is only supported from 11.2 on
/*
select
count(*)
, sys.dbms_compression.get_compression_type(user, 'T1', rowid) as comp_type
from
t1
group by
sys.dbms_compression.get_compression_type(user, 'T1', rowid);
*/

alter session set tracefile_identifier = 'after_update';

-- In 11.2 the block dumps are only showing disk content
-- alter system checkpoint;

alter system dump datafile &file_no block &block_no;

accept rdba prompt 'Enter forwarding ROWID found in block dump: '

column rdba new_value rdba

-- Remove any potential leading and trailing unnecessary stuff
select
substr('&rdba',
case
when instr('&rdba', '0x') = 0
then 1
else instr('&rdba', '0x') + 2
end,
case
when instr('&rdba', '.') = 0
then 32767
else instr('&rdba', '.') -
case
when instr('&rdba', '0x') = 0
then 0
else instr('&rdba', '0x') + 2
end
end
) as rdba
from
dual
;

select
dbms_utility.data_block_address_file(to_number('&rdba', rpad('X', length('&rdba'), 'X'))) as file_no
, dbms_utility.data_block_address_block(to_number('&rdba', rpad('X', length('&rdba'), 'X'))) as block_no
from
dual
;

alter session set tracefile_identifier = 'migrated_rows';

-- In 11.2 the block dumps are only showing disk content
-- alter system checkpoint;

alter system dump datafile &file_no block &block_no;


If the bug is hit, then the UPDATE will take several seconds and consume mostly CPU. The statistics will show millions of "session logical reads" for the update of the 50,000 rows.

Up to version 10.2 the output from Jonathan's script will show something similar to this:


SQL> execute snap_kcbsw.end_snap
---------------------------------
Buffer Cache - 15-May 13:11:55
Interval:- 16 seconds
---------------------------------
Why0 Why1 Why2 Other Wait
---- ---- ---- ----------
.
.
.
1,037,665 0 0 0 ktspfwh10: ktspscan_bmb
.
.
.
4,813,907 0 0 0 ktspbwh1: ktspfsrch
.
.
.
---- ---- ---- ----------
6,200,305 0 0 0 Total: 61 rows


Running a stack profile while the update is executing using e.g. Tanel's OSTACKPROF script will show an output similar to this:


Below is the stack prefix common to all samples:
------------------------------------------------------------------------
Frame->function()
------------------------------------------------------------------------
# 43 ->000000007702652D
# 42 ->OracleThreadStart()
# 41 ->opimai()
# 40 ->opimai_real()
# 39 ->sou2o()
# 38 ->opidrv()
# 37 ->opiodr()
# 36 ->opiino()
# 35 ->opitsk()
# 34 ->ttcpip()
# 33 ->opiodr()
# 32 ->kpoal8()
# 31 ->opiexe()
# 30 ->updexe()
# 29 ->updThreePhaseExe()
# 28 ->updaul()
# 27 ->qerupFetch()
# 26 ->updrow()
# 25 ->kauupd()
# 24 ->kdusru()
# 23 ->kduurp()
# 22 ->kdumrp()
# 21 ->kdtInsRow()
# 20 ->kdtgrs()
# 19 ->kdtgsp()
# 18 ->ktspgsp_main()
# 17 ->ktspscan_bmb()
# 16 ->ktspfsrch()
# ...(see call profile below)
#
.
.
.


Tanel's Buffer Gets Profiler (old V1.02 version, the current V1.03 version crashed my 11.1.0.7 SYSDBA session) shows an output similar to this:


SID KCBBFWHY KCBBFSO_FLG TO_CHAR(K KCBBFCM KCBBFSO_OWN DEC KCBWHDES TOTAL_SAMPLES ACTIVE_PCT
---------- ---------- ----------- --------- ---------- ---------------- --- ------------------------------ ------------- ----------
141 0 1 100001 0 000007FF1A3FF8D0 CR kdswh11: kdst_fetch 1000 3160
141 0 1 100001 0 000007FF1E1C7F18 CR kdswh11: kdst_fetch 1000 3160
141 0 1 0 0 000007FF1E1C7F18 CUR ktspbwh1: ktspfsrch 942 2976.72
141 0 1 4000000 2 000007FF1E1C7F18 CUR ktspbwh1: ktspfsrch 41 129.56
141 0 1 0 0 000007FF1E1C7F18 CUR ktspfwh10: ktspscan_bmb 11 34.76
141 0 1 0 0 000007FF1E1C9EE8 CR kdswh11: kdst_fetch 2 6.32
141 0 1 1000 -2.147E+09 000007FF1E1C9EE8 CUR ktuwh02: ktugus 1 3.16
141 0 1 0 2 000007FF1E1C7F18 CUR kduwh01: kdusru 1 3.16
141 0 1 100001 0 000007FF1E1C9EE8 CR kdswh11: kdst_fetch 1 3.16
141 0 1 0 2 000007FF1E1C7F18 CUR ktspbwh1: ktspfsrch 1 3.16


So the function names "ktspfsrch" and "ktspscan_bmb" are common to all diagnosis output. They belong to the internal "kts" space management layer and seem to suggest that Oracle spends most of the time searching for "free space" where to put the migrated row.

So the question is why does Oracle apparently spend so much time searching for a suitable block when migrating a row? A look into the block dump that you get by entering the forwarding ROWID found in the original block after the update (search for "nrid" in the block dump and paste the hex representation of the ROWID at the prompt shown) reveals that Oracle has used up all available ITL slots in the block (for each row migrated into a block a new ITL is allocated) but there is still free space left in the block:


.
.
.
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000a.02f.0000045c 0x008011f7.03d6.41 --U- 167 fsc 0x0000.00335363
0x02 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
0x03 0x0000.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.00000000
.
.
.
0xa7 0x0000.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.00000000
0xa8 0x0000.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.00000000
0xa9 0x0000.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.00000000

data_block_dump,data header at 0x11a5540c
===============
tsiz: 0xff0
hsiz: 0x160
pbl: 0x11a5540c
bdba: 0x0240015a
76543210
flag=--------
ntab=1
nrow=167
frre=-1
fsbo=0x160
fseo=0x38b
avsp=0x22b
tosp=0x22b
.
.
.


Above example is taken from an 8K block. Depending on the block size there is a hard limit on the number of ITL slots since Oracle seems to limit the space allocated by the ITL entries to approx. 50% of the block or 255 entries, whatever is reached first. So we get the following maximum number of ITL slots per block size:

2K: 41 (* 24 bytes = 984 bytes)
4K: 83 (* 24 bytes = 1992 bytes)
8K: 169 (* 24 bytes = 4056 bytes)
16K: 255 (* 24 bytes = 6120 bytes)
32K: 255 (* 24 bytes = 6120 bytes)

As you can see from above, the hard limit of 169 as defined for 8K blocks has been reached (0xA9 = 169), but from a free space "perspective" the block is not full yet (avsp=0x22b, given a PCTFREE of 0).

The basic problem of the bug seems to be that blocks in such a state are still considered as possible candidates for inserts of migrated rows, but have then to be rejected as Oracle recognizes that no further ITL entries can be added. The more blocks get into this state, the harder Oracle has to work to find a block that really can be used for storing the migrated row.

This can also be confirmed by running a trace with event 10320 (Enable data layer (kdtgrs) tracing of space management calls) and 10612 (Prints debug information for auto-space managed segments) enabled. This will generate a huge trace file with contents like the following:


.
.
.
kdt_bseg_srch_cbk: examine dba=10.0x024000e0
kdt_bseg_srch_cbk: failed dba=10.0x024000e0 avs=0 afs=5274 tosp=555 full=1
ktspfsrch:Cbk didnot like 0x024000e0
ktspfsrch: Returns: BlockDBA:0x024000e4
kdt_bseg_srch_cbk: examine dba=10.0x024000e4
kdt_bseg_srch_cbk: failed dba=10.0x024000e4 avs=0 afs=5274 tosp=555 full=1
ktspfsrch:Cbk didnot like 0x024000e4
ktspfsrch: Returns: BlockDBA:0x024000e8
kdt_bseg_srch_cbk: examine dba=10.0x024000e8
kdt_bseg_srch_cbk: failed dba=10.0x024000e8 avs=0 afs=5274 tosp=555 full=1
ktspfsrch:Cbk didnot like 0x024000e8
ktspfsrch: Returns: BlockDBA:0x024000ec
kdt_bseg_srch_cbk: examine dba=10.0x024000ec
kdt_bseg_srch_cbk: found dba=10.0x024000ec avs=7215 afs=0 tosp=7215 full=1
.
.
.


The longer the UPDATE executes, the more "...didnot like..." messages will appear per row to migrate.

Since the problem also shows up when performing for example single-row updates, it can be seen that the number of "db block gets" performed per row increases with the number blocks that are filled with migrated rows and therefore are checked over and over again.

So the question of hitting the bug or not simply depends on the number of rows that can be migrated into a single block before the block is full and no longer considered for inserts. If the number of migrated rows that fit into the block is larger than the maximum number of ITL entries, the problem will show up when using a combination of an ASSM tablespace and a pre-11.2 version.

Of course from the figures above it becomes obvious that the problem can arise only if a migrated row is very small: Up to the 8K block size it needs a migrated row size of less than approx. 23 bytes (excluding the two bytes for the row directory entry), but only if the PCTFREE is set to 0. With larger PCTFREE settings the migrated row needs to be even smaller which is very unlikely since each migrated row already requires to hold the "head ROWID" that takes up 6 bytes, plus the 3 bytes for the column count, lock and flag byte. This means just 13 bytes left for data, which again need to hold a column length byte for every column stored.

This explains also why it is more likely to hit the problem with block sizes larger than 8K since there the maximum number of ITL slots (255) allocate less than 50% of the block and therefore the migrated rows can be larger and yet more than 255 might fit into a single block.

Given this information there are some interesting variations of the test case possible. If you for example increase the PCTFREE of the table right before the UPDATE (by uncommenting the "alter table t1 pctfree 50", depending on the block size a much lower PCTFREE setting will be sufficient to prevent the bug), you still will end up with the same number of rows migrated but the bug doesn't show up simply because less migrated rows fit into each block and therefore the maximum number of ITL slots won't be reached before the block fills up.

Summary

The purpose of this post was to explain the basic underlying principles of the bug which therefore can happen with any block size.

Of course you might now think "so, what?" since it needs quite an exotic setup to reproduce the issue in a default 8K block size (very small rows, PCTFREE close to 0, excessive row migrations) or larger non-default block sizes (16K / 32K) and still quite small migrated rows.

Up to 10.2 you're probably right - that's the end of the story there. However in an upcoming post I'll demonstrate that in 11.1 when throwing other features into the mix like compression and partitioning a nasty variation of the bug can become much more relevant to practical, real-life cases.

7 comments:

  1. "The bug can show up with any block size" is significant.
    I do remember following the original discussion a long time ago (going by the timestamp on Greg Rahn's blog, Aug-Sep '08). Then, part of the "conclusion" was that it was manifest in 16K blocks. (I believe that it was in this discussion that comments like "don't use non-standard block sizes because Oracle doesn't test them well" were also repeated).


    So, you can reproduce this in 8K blocks as well.

    ReplyDelete
  2. Hi Hemant,

    yes, it is true that it can be reproduced in any block size - but you have to consider that it takes quite an exotic setup in those smaller block sizes, therefore it is quite unlikely to be hit in real life since most of the time the migrated rows will be longer than 24 bytes (as discussed above).

    However, and that is going to be covered in part 2, the variation of the bug introduced in 11.1 can happen with longer row sizes in combination with the default block size of 8K, so it is much more relevant to real-life cases.

    Randolf

    ReplyDelete
  3. Randolf,
    thank you for the concise explanation. An interesting addition to the current long running update thread in OTN (http://forums.oracle.com/forums/thread.jspa?threadID=2215565&start=30&tstart=0).

    Martin

    ReplyDelete
  4. Hi Martin,

    you might remember that thread some time ago on a German forum where also someone hit the variation of the bug - that was the first time I realized that there is more to this bug than the basic one described here.

    The recent OTN post finally pushed me to complete at least this part 1 note about the basic bug, and hopefully I'll be able to complete part 2 sometime soon, since it is much more relevant and also very likely explains what has been described in the "long running update" thread.

    Randolf

    ReplyDelete
  5. Randolf,
    yes, I remember the thread. Rereading the discussing, I think I can recognize some of the patterns - but still I have no idea what role the flushing of the shared pool (mentioned in the xing thread) could play in this context. I look forward to part 2.
    Martin

    ReplyDelete
  6. Randolf,

    Why ASSM and not MSSM? The first reason that comes to mind is that MSSM will take the block off the free list when all ITL slots are full. However I don't know if that is really the case.

    Interesting stuff.

    ReplyDelete
  7. Henry,

    I think it is simply sbout a shortcoming/bug in the ASSM implementation that has been fixed in 11.2.

    As you've pointed out, MSSM and ASSM handle the "freeness" of blocks differently internally, so you'll end up with those little details that might make a big difference in some special circumstances.

    Randolf

    ReplyDelete