Sunday, January 23, 2011

Quiz Night

I've recently come across an interesting variation of a "famous" ASSM bug. Probably some of you will remember that ASSM bug that was caused by row migrations in larger block sizes (16K/32K).

If you don't remember or don't know what I'm talking about, you can have a look here where Greg Rahn provides a summary of the issue or check My Oracle Support bug description 6918210.

Greg also links to a script originally created by Jonathan Lewis that allows to reproduce the issue at will.

So far the issue was only reproduced on block sizes greater 8K - the variation I've encountered however allows to reproduce the issue on 8K and 4K, possibly also on 2K, but I haven't tested 2K yet.

Below is my version of script. If you compare it to Jonathan's version you'll notice that it is very similar, if not to say almost the same except for additional optional instrumentation, that you can simply un-comment if you've installed my Advanced Oracle Troubleshooting script package that is based on Tanel Poder's awesome "tpt_public" tool set.

The SESSPACK tool can be found in Tanel's tool set (tools/sesspack_0.05_release) and the SNAP_KCBSW package has been developed by Jonathan a long time ago - it can be found here. Note that it only works for versions below 11g - this instrumentation has been "optimized away" in 11g, unfortunately.

In order to reduce the runtime, I've simply limited the number of rows in the table to 50,000 rows.


set echo on timing on

drop table t1;

purge table t1;

CREATE TABLE t1
(n1 NUMBER,
n2 NUMBER)
TABLESPACE &tblspace;

INSERT --+ append
INTO t1
SELECT TRUNC(dbms_random.VALUE(10000000,100000000)) n1,
TO_NUMBER(NULL) AS n2
FROM dual
CONNECT BY LEVEL <= 50000
/

BEGIN dbms_stats.gather_table_stats(
ownname => null,
tabname => 'T1');
END;
/

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

/* Uncomment for instrumentation
@trci assm_bug

@trc_f

@trc_p

exec sesspack.snap_me

execute snap_kcbsw.start_snap

@46on 8
*/

alter session set events '10046 trace name context forever, level 8';

UPDATE t1 SET n2 = n1;

commit;

/* Uncomment for instrumentation
@trci assm_bug_off

@46off
*/

alter session set events '10046 trace name context off';

/* Uncomment for instrumentation
set serveroutput on size 1000000 format wrapped
set linesize 120
set trimspool on

execute snap_kcbsw.end_snap

exec sesspack.snap_me
*/

BEGIN dbms_stats.gather_table_stats(
ownname => null,
tabname => 'T1');
END;
/

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

/* Uncomment for instrumentation
@trc_orasrp &trc_p &trc_f

@trc_tvdxtat &trc_p &trc_f
*/


Here is the task: You are allowed to modify the script at exactly one single location - the modification can take a maximum of four keywords, which means you can add/modify/remove at most four keywords.

With the correct modification you will be able to reproduce the bug even in 8K and lower block sizes.

So, what to modify and why?

If you want to actually run the script yourself you need to use database versions prior 11.2 because the bug is obviously fixed there - this includes 10.2.0.5, which interestingly doesn't have the bug fixed.

I've used a 8K/4K ASSM tablespace with UNIFORM 1M extents for my tests, but I don't think that the extent management matters in that case. My test database uses 8K as default block size.

You'll notice the bug when checking the runtime and the trace file. If you encounter the bug, the runtime for the update will be several seconds (more than 10 seconds seen on my test system in some cases) and the number of current mode gets for the update will be in the millions.

If you've enabled the additional instrumentation it will tell you that the reasons for the buffer gets where "ktspfsrch" and "ktspscan_bmb" for most of the gets. You can also take stack traces (e.g. using Tanel's OStackProf tool) if you use more than 50,000 rows to have a longer runtime of the update statement which will show you similar function names on the stack.

If you don't hit the bug, the update usually takes max. 1-2 seconds, and the current mode gets should be far less than one million when sticking to the 50,000 rows.

P.S.: There is more than one correct answer - and it is possible to hit the bug for 8K block sizes with a single keyword modification (full points!).

Update 24th Jan: P.P.S: No takers yet... So here's an additional hint: The issue is caused by row migration...

Update 26th Jan: OK, time to post a quick answer here. As pointed out by Narendra below, simply setting PCTFREE to 0 already was sufficient to reproduce the issue with smaller block sizes. However, there is much more to tell about and therefore this deserves a separate post that I'll publish the next couple of days.

For the time being here are the correct answers that I'm aware of at present:

- PCTFREE 0
- COMPRESS
- COMPRESS FOR ALL OPERATIONS

But as I already said, there is much more, in particular when partitioning comes into the picture - and I hope to cover all these details in the upcoming post.

4 comments:

  1. I am not able to reproduce it on 10.2.0.1 (yes, I know it is a "buggy" version) and I can't think of the exact number but as you give a clue about row migration, would mentioning a lower value (than the default 10) for PCTFREE setting in CREATE TABLE statement cause this?

    ReplyDelete
  2. Hi Nerendra,

    yep, one down. It is as easy as that - just use PCTFREE 0 with 8K block size and the problem showed up in my tests (10.2.0.4, 10.2.0.5 and 11.1.0.7).

    So "PCTFREE 0" means two words, but in my actual case it was only a single word modification.

    There are (at least) two other possibilities left!

    Randolf

    ReplyDelete
  3. Not sure how long you are going to wait before telling us the "magic answer" :)
    But is that one keyword change is change data type of column n2 from NUMBER to say, CHAR(10) ?

    ReplyDelete
  4. Hi Narendra,

    sorry for misspelling your name in my previous comment...

    Your comment about changing the data type is a very interesting one and I hope to cover that as well in the upcoming post about the details of the issue I've encountered.

    For the answers I know of at present, see the updated post.

    Randolf

    ReplyDelete