Sunday, March 8, 2009

Optimizer partition oddities, part 1: Severe bug in 10.2.0.4 patch set

Forward to part 2

It seems to be a bit like in the movie the "Groundhog Day": Although I've highlighted already twice an issue when using range-list subpartition pruning, it seems to be reoccurring and there are still a lot of oddities to tell about.

A recent case of a client showed that the subpartition pruning fix that has been incorporated into the 10.2.0.4 patch set is unfortunately quite buggy.

The test case used in my previous posts therefore represents only half of the truth.

Although when using explicit subpartition pruning by means of the "subpartition" clause 10.2.0.4 correctly applies the subpartition statistics (in contrast to 10.2.0.3 which uses the partition level statistics), it fails miserably when using "normal" partition pruning by specifying the partition keys.

Using the same test setup as in my previous post running this simple query in the different versions reveals some interesting results:


select data1 from partition_test where x_pkey = 1 and x_slice = '001';


This simply prunes to a single subpartition containing 10 rows which can be seen from the subpartition level statistics. You can refer to the previous posts for the setup and statistics details.

Here are the EXPLAIN PLAN results from the different versions which were taken from the 10053 trace files (explaining the slightly odd formatting).

11.1.0.7:


------------------------------------------------+-----------------------------------+---------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | Pstart| Pstop |
------------------------------------------------+-----------------------------------+---------------+
| 0 | SELECT STATEMENT | | | | 2 | | | |
| 1 | PARTITION RANGE SINGLE | | 9 | 90 | 2 | 00:00:01 | 2 | 2 |
| 2 | PARTITION LIST SINGLE | | 9 | 90 | 2 | 00:00:01 | 2 | 2 |
| 3 | TABLE ACCESS FULL | PARTITION_TEST| 9 | 90 | 2 | 00:00:01 | 4 | 4 |
------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
3 - filter("X_PKEY"=1)


10.2.0.4:


------------------------------------------------+-----------------------------------+---------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | Pstart| Pstop |
------------------------------------------------+-----------------------------------+---------------+
| 0 | SELECT STATEMENT | | | | 2 | | | |
| 1 | PARTITION RANGE SINGLE | | 1 | 38 | 2 | 00:00:01 | 2 | 2 |
| 2 | PARTITION LIST SINGLE | | 1 | 38 | 2 | 00:00:01 | 2 | 2 |
| 3 | TABLE ACCESS FULL | PARTITION_TEST| 1 | 38 | 2 | 00:00:01 | 4 | 4 |
------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
3 - filter("X_PKEY"=1)


10.2.0.3:


------------------------------------------------+-----------------------------------+---------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | Pstart| Pstop |
------------------------------------------------+-----------------------------------+---------------+
| 0 | SELECT STATEMENT | | | | 2 | | | |
| 1 | PARTITION RANGE SINGLE | | 10 | 100 | 2 | 00:00:01 | 2 | 2 |
| 2 | PARTITION LIST SINGLE | | 10 | 100 | 2 | 00:00:01 | 2 | 2 |
| 3 | TABLE ACCESS FULL | PARTITION_TEST| 10 | 100 | 2 | 00:00:01 | 4 | 4 |
------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
3 - filter("X_PKEY"=1)


So according to the EXPLAIN PLAN output 11.1.0.7 and 10.2.0.3 get it right, but 10.2.0.4 reports an incorrect cardinality.

Checking the 10053 optimizer trace file reveals some interesting details:

11.1.0.7:


BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: PARTITION_TEST Alias: PARTITION_TEST (making adjustments for partition skews)
ORIGINAL VALUES:: #Rows: 10 #Blks: 1 AvgRowLen: 10.00
SUBPARTITIONS::
PRUNED: 1
ANALYZED: 1 UNANALYZED: 0
Partition [3]
#Rows: 10 #Blks: 1 AvgRowLen: 10.00
#Rows: 10 #Blks: 1 AvgRowLen: 10.00
Access path analysis for PARTITION_TEST
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for PARTITION_TEST[PARTITION_TEST]
Column (#2):
NewDensity:0.004950, OldDensity:0.000495 BktCnt:1010, PopBktCnt:1010, PopValCnt:2, NDV:2

Table: PARTITION_TEST Alias: PARTITION_TEST
Card: Original: 10.000000 Rounded: 9 Computed: 9.03 Non Adjusted: 9.03
Access Path: TableScan
Cost: 2.00 Resp: 2.00 Degree: 0
Cost_io: 2.00 Cost_cpu: 8055
Resp_io: 2.00 Resp_cpu: 8055
Best:: AccessPath: TableScan
Cost: 2.00 Degree: 1 Resp: 2.00 Card: 9.03 Bytes: 0

***************************************


11.1.0.7 correctly identifies the subpartition (10 rows, 1 block). What seems to be odd that it reports for the column #2 (X_SLICE) apparently partition level statistics (1010 BktCnt, Number of Distinct Values: 2), but it doesn't seem to influence the cardinality estimate significantly. When removing any histograms on subpartition level the cardinality estimate is correctly shown as 10.

10.2.0.4:


BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: PARTITION_TEST Alias: PARTITION_TEST (making adjustments for partition skews)
ORIGINAL VALUES:: #Rows: 0 #Blks: 1 AvgRowLen: 0.00
SUBPARTITIONS::
PRUNED: 1
ANALYZED: 1 UNANALYZED: 0
Partition [2] (NOT ANALYZED)
#Rows: 0 #Blks: 58593632 AvgRowLen: 1025.00
#Rows: 0 #Blks: 1 AvgRowLen: 0.00
***************************************
SINGLE TABLE ACCESS PATH
-----------------------------------------
BEGIN Single Table Cardinality Estimation
-----------------------------------------
Column (#1): X_PKEY(NUMBER) Part#: 2
AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+000 Min: 0 Max: 0
Column (#1): X_PKEY(NUMBER)
AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+000 Min: 0 Max: 0
Column (#2): X_SLICE(VARCHAR2) Part#: 2
AvgLen: 20.00 NDV: 0 Nulls: 0 Density: 0.0000e+000 Min: 0 Max: 0
Column (#2): X_SLICE(VARCHAR2)
AvgLen: 20.00 NDV: 0 Nulls: 0 Density: 0.0000e+000 Min: 0 Max: 0
Table: PARTITION_TEST Alias: PARTITION_TEST
Card: Original: 0 Rounded: 1 Computed: 0.00 Non Adjusted: 0.00
-----------------------------------------
END Single Table Cardinality Estimation
-----------------------------------------
Access Path: TableScan
Cost: 2.00 Resp: 2.00 Degree: 0
Cost_io: 2.00 Cost_cpu: 7121
Resp_io: 2.00 Resp_cpu: 7121
Best:: AccessPath: TableScan
Cost: 2.00 Degree: 1 Resp: 2.00 Card: 0.00 Bytes: 0
***************************************


This looks like a mess, and it is indeed. You'll get always an estimated cardinality of 0 (sanity checked to 1) in the 10.2.0.4 patch set if you prune to a single subpartition. This will screw up almost any plan and hence potentially show abysmal performance.

The issue is tracked with bug Bug 7210921 - STATISTICS WITH SUBPARTITIONS ARE NOT CORRECTLY INTERPRETED IN EXPLAIN PLANS and Bug 7264383 - WRONG STATISTICS 0 ROWS FOR QUERY PRUNED TO SUBPARTITION ON 10.2.0.4. More information can be found in Metalink Note 728056.1. A patch for bug 7210921 can be requested and is already available for a couple of platforms.

10.2.0.3:


BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: PARTITION_TEST Alias: PARTITION_TEST (making adjustments for partition skews)
ORIGINAL VALUES:: #Rows: 1010 #Blks: 4 AvgRowLen: 10.00
SUBPARTITIONS::
PRUNED: 1
ANALYZED: 1 UNANALYZED: 0
Partition [1]
#Rows: 1010 #Blks: 4 AvgRowLen: 10.00
#Rows: 1010 #Blks: 1 AvgRowLen: 10.00
***************************************
SINGLE TABLE ACCESS PATH
-----------------------------------------
BEGIN Single Table Cardinality Estimation
-----------------------------------------
Column (#1): X_PKEY(NUMBER) Part#: 1
AvgLen: 3.00 NDV: 1 Nulls: 0 Density: 4.9505e-004 Min: 1 Max: 1
Histogram: Freq #Bkts: 1 UncompBkts: 1010 EndPtVals: 1
Column (#1): X_PKEY(NUMBER)
AvgLen: 3.00 NDV: 1 Nulls: 0 Density: 4.9505e-004 Min: 1 Max: 1
Histogram: Freq #Bkts: 1 UncompBkts: 1010 EndPtVals: 1
Column (#2): X_SLICE(VARCHAR2) Part#: 1
AvgLen: 4.00 NDV: 2 Nulls: 0 Density: 4.9505e-004
Histogram: Freq #Bkts: 2 UncompBkts: 1010 EndPtVals: 2
Column (#2): X_SLICE(VARCHAR2)
AvgLen: 4.00 NDV: 2 Nulls: 0 Density: 4.9505e-004
Histogram: Freq #Bkts: 2 UncompBkts: 1010 EndPtVals: 2
Table: PARTITION_TEST Alias: PARTITION_TEST
Card: Original: 1010 Rounded: 10 Computed: 10.00 Non Adjusted: 10.00
-----------------------------------------
END Single Table Cardinality Estimation
-----------------------------------------
Access Path: TableScan
Cost: 2.01 Resp: 2.01 Degree: 0
Cost_io: 2.00 Cost_cpu: 63081
Resp_io: 2.00 Resp_cpu: 63081
Best:: AccessPath: TableScan
Cost: 2.01 Degree: 1 Resp: 2.01 Card: 10.00 Bytes: 0
***************************************


The 10.2.0.3 output reveals that it uses the partition level statistics and only gets it right in this particular case due to an frequency histogram on X_SLICE on the partition level. If you remove this histogram the cardinality estimate will be wrong (505).

So if you're using range-list subpartitioning with 10.2.0.4 and use to prune to a single subpartition you should either use 10.2.0.3 optimizer features using the OPTIMIZER_FEATURES_ENABLE parameter/hint to avoid the mess shown above - though 10.2.0.3 can only use partition level statistics - or get the patch mentioned installed.

In the next part of this series I'm going to show some further oddities when using list partitioning. These apply also to 11gR1.

12 comments:

  1. Thanks for this...we've just hit this, or something very similar. We're on 10.2.0.4 on AIX 6 and don't have that patch installed...fortunately, we're going to migrate to 11gR2 in next few weeks, so we're not bothering with the patch...we'll retest on 11gR2 and hopefully it disappears then.

    Thanks again.
    Jeff

    ReplyDelete
  2. ich hasse ja diese formatiererei.
    aber wenns nicht anders geht, der kommentar dazu

    http://perfexpert2.wordpress.com/2012/08/19/cardinality-estimates-on-list-subpartitioned-tables-with-binds-on-partition-literals-on-subpartition/

    ReplyDelete
  3. SZ,

    thanks for the comment and the link.

    I haven't had a chance yet to look into the details why adding a subpartition in your particular case lead to the changed cardinality estimates, but thanks for the heads-up.

    Randolf

    ReplyDelete
  4. Hi Randolf,
    I am a regular follower of your blog, and I must thank you for the wonderful service you are doing for the oracle fraternity.
    I have recently observed a different incarnation of the oddity mentioned by you in 11.2.0.2.
    In my case, I have range partitioned table which is further hash partitioned based on two other columns. So when I run a query against the table using the partitioning key columns and subpartition key columns, the optimizer gets the cardinality right based on the partition statistics but get the #Blks estimate completely wrong, as shown in your example for 10.2.0.4 (optimizer trace). Since the #Blks is wrongly estimated as very less, so serial direct path read is not coming to play and hence no smart scan(on exadata). I can work around the problem by setting one of the following options:
    1) set _serial_direct_read = TRUE
    2) set _direct_read_decision_statistics_driven to FALSE
    3) Setting optimizer_features_enable = 11.1.0.6
    This problem is easily reproducible in any 11.2.0.2 environment. Let me know your thoughts on this.

    ReplyDelete
  5. Hi Aveek,

    thanks for the kind feedback and the interesting case.

    If you can reproduce the issue using a simple, generic test case, then I would check if you still get the same incorrect block estimate in 11.2.0.3 / 11.2.0.4 and 12.1.

    If yes, you should open an SR, if not you should check the bug fix list and see if you can identify the bug. You could then check if there is a back port available for your 11.2.0.2 version.

    And if you have such a simple test case, feel free to post it here if you like.

    Randolf

    ReplyDelete
  6. Hi Randolf,
    Thanks for the quick response. Please find my test case below, I am posting as two comments due to limitation of 4096 characters.

    select * from v$version

    BANNER

    Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
    PL/SQL Release 11.2.0.2.0 - Production
    CORE 11.2.0.2.0 Production
    TNS for Linux: Version 11.2.0.2.0 - Production
    NLSRTL Version 11.2.0.2.0 - Production



    CREATE TABLE TEST_TAB (
    joining_date DATE,
    empl_id NUMBER,
    dept_id NUMBER,
    salary NUMBER,
    gender VARCHAR2(1),
    address VARCHAR2(100)
    )
    PARTITION BY RANGE (JOINING_DATE)
    SUBPARTITION BY HASH (DEPT_ID, GENDER)
    SUBPARTITION TEMPLATE
    (SUBPARTITION SP01,
    SUBPARTITION SP02,
    SUBPARTITION SP03,
    SUBPARTITION SP04
    )
    (
    PARTITION P20130701 VALUES LESS THAN (TO_DATE(' 2013-07-02 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
    PARTITION P20130702 VALUES LESS THAN (TO_DATE(' 2013-07-03 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
    PARTITION P20130703 VALUES LESS THAN (TO_DATE(' 2013-07-04 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
    )


    INSERT INTO test_tab
    SELECT TO_DATE (ROUND (DBMS_RANDOM.VALUE (1, 3)) || '-' || '07' || '-' || '2013', 'DD-MM-YYYY') joining_date,
    LEVEL empl_id,
    ROUND (DBMS_RANDOM.VALUE (1, 16)) dept_id,
    TRUNC (DBMS_RANDOM.VALUE (1000, 500000), 2) salary,
    DECODE (ROUND (DBMS_RANDOM.VALUE (1, 2)), 1, 'M', 2, 'F') gender,
    DBMS_RANDOM.string ('x', DBMS_RANDOM.VALUE (20, 50)) address
    FROM DUAL
    CONNECT BY LEVEL < 10000

    COMMIT;


    ReplyDelete
  7. Contd...

    BEGIN
    DBMS_STATS.GATHER_TABLE_STATS (
    OwnName => USER
    ,tabname => 'TEST_TAB'
    ,Estimate_Percent => NULL
    ,Method_Opt => 'FOR ALL COLUMNS SIZE 1'
    ,Degree => 8
    ,Cascade => TRUE
    ,No_Invalidate => TRUE
    ,granularity => 'GLOBAL AND PARTITION'
    ,force => TRUE);
    END;
    /

    Lets have a quick look at the stats:

    TABLE_NAME PARTITION_NAME SUBPARTITION_NAME AVG_ROW_LEN NUM_ROWS BLOCKS

    TEST_TAB 60 9999 648
    TEST_TAB P20130701 59 2500 216
    TEST_TAB P20130702 59 5018 216
    TEST_TAB P20130703 59 2481 216

    Lets run a query now and have a look at the optimizer trace.

    select * from test_tab
    where joining_date = '02-Jul-2013'
    and gender = 'M'
    and dept_id = 1

    From 10053 Trace File:

    ***************************************
    BASE STATISTICAL INFORMATION
    ***********************
    Table Stats::
    Table: TEST_TAB Alias: TEST_TAB (making adjustments for partition skews)
    ORIGINAL VALUES:: #Rows: 5018 #Blks: 216 AvgRowLen: 59.00 ChainCnt: 0.00
    PARTITIONS::
    PRUNED: 1
    ANALYZED: 1 UNANALYZED: 0
    Partition [1]
    #Rows: 5018 #Blks: 216 AvgRowLen: 59.00 ChainCnt: 0.00
    #Rows: 5018 #Blks: 72 AvgRowLen: 59.00 ChainCnt: 0.00


    It seems the optimizer is wrongly computing the #Blks as 72 by dividing the #Blks in partition (216) by the number of partitions (3). So when
    we have hundreds of partitions the #Blks estimates drop too low and as such serial direct read doesnt come to play.


    If we repeat the test by setting the optimizer_features_enable to 11.1.0.6, we will see that the optizerer gets the correct estimates as shown below:



    select /*+ optimizer_features_enable('11.1.0.6') */ * from test_tab
    where joining_date = '02-Jul-2013'
    and gender = 'M'
    and dept_id = 1


    ***************************************
    BASE STATISTICAL INFORMATION
    ***********************
    Table Stats::
    Table: TEST_TAB Alias: TEST_TAB (making adjustments for partition skews)
    ORIGINAL VALUES:: #Rows: 5018 #Blks: 216 AvgRowLen: 59.00 ChainCnt: 0.00
    PARTITIONS::
    PRUNED: 1
    ANALYZED: 1 UNANALYZED: 0
    Partition [1]
    #Rows: 5018 #Blks: 216 AvgRowLen: 59.00 ChainCnt: 0.00
    #Rows: 5018 #Blks: 216 AvgRowLen: 59.00 ChainCnt: 0.00

    ReplyDelete
  8. Hi Aveek,

    thanks for posting the test case.

    A few questions first: What block size do you use, and how is the tablespace defined you've used for testing - is it uniform or system allocated extents, and is it ASSM or MSSM?

    I find the blocks reported per partition interesting - I cannot simply reproduce the 216 blocks you've shown per partition, in particular since each partition contains a different number of rows.

    Now regarding your observation regarding the blocks reported: I'm not sure I can follow your point, because you seem to say that you want Oracle to report the 216 blocks as it does with 11.1.0.6 optimizer settings, but you prune to a *single* hash subpartition, so in fact the optimizer shouldn't assume 216 blocks, but 216 divided by 4, assuming a even distribution of data by hash.

    So the number of blocks assumed might be wrong, since it looks like 216 / 3 instead of 216 / 4, but 216 is surely wrong, too, and an overestimate.

    By the way, since you seem to prune to a single hash subpartition - if you generate statistics on subpartition level (granularity ALL), you get the same estimates from 11.2.0.2 and 11.1.0.6 optimizer settings, and these should be the correct ones taken from the subpartition statistics.

    And finally: The cardinality estimate is wrong in your test case in my opinion, since it is taken from the partition statistics, but again: You prune to a *single* hash subpartition, so the cardinality should be less than the one taken from the partition (divided by 4 should be a good starting point).

    The only thing that looks suspicious to me at present is why 11.2.0.2 uses partition blocks / 3 instead of partition blocks / 4, and when I try with different number of partitions / subpartitions it becomes clear that it indeed looks like a bug, since Oracle uses the number of partitions instead of subpartitions to calculate the number of blocks from the partition statistics.

    So your problem is probably more caused by that bug, assuming you have many, many partitions the block estimate will become very small, but the 11.1.0.6 approach is wrong, too, as it simply uses the partition level stats when pruning to a single subpartition.

    A workaround to arrive at better estimates would be to generate subpartition statistics so that Oracle can simply pick the actual blocks number from the corresponding subpartition statistics, but that will take much longer than generating just partition level stats, and might have side effects on estimates for other queries that don't prune to a single hash subpartition.

    Randolf

    ReplyDelete
  9. Hi Randolf,
    Many thanks for the quick response.

    My tablespace is a BIGFILE tablespace with AUTOALLOCATE DEFAULT and SEGMENT SPACE MANAGEMENT AUTO with a block size of 16384.

    Couple of more observations:

    1) Even if I gather subpartition stats, the optimizer still continues to use partition stats. It does not use the subpartition stats.

    2) I was referring to the 11.1.0.6 optimizer settings as correct because as indicated in the "BASE STATISTICAL INFORMATION" of the optimizer trace it is doing PARTITION PRUNING and it is getting the partition stats correctly (216), I would have expected to see "SUBPARTIONS" PRUNED in the optimizer trace, if it was trying to use subpartition stats. Please correct me if I am wrong.





    ReplyDelete
  10. Hi Aveek,

    this is what I got from my 11.2.0.2 installation when running your test case, although with different block size and extent / segment management:

    Partition Level Stats only:

    ***************************************
    BASE STATISTICAL INFORMATION
    ***********************
    Table Stats::
    Table: TEST_TAB Alias: TEST_TAB (making adjustments for partition skews)
    ORIGINAL VALUES:: #Rows: 5002 #Blks: 200 AvgRowLen: 59.00 ChainCnt: 0.00
    PARTITIONS::
    PRUNED: 1
    ANALYZED: 1 UNANALYZED: 0
    Partition [1]
    #Rows: 5002 #Blks: 200 AvgRowLen: 59.00 ChainCnt: 0.00
    #Rows: 5002 #Blks: 67 AvgRowLen: 59.00 ChainCnt: 0.00


    Subpartition Stats gathered:

    ***************************************
    BASE STATISTICAL INFORMATION
    ***********************
    Table Stats::
    Table: TEST_TAB Alias: TEST_TAB (making adjustments for partition skews)
    ORIGINAL VALUES:: #Rows: 1516 #Blks: 60 AvgRowLen: 59.00 ChainCnt: 0.00
    SUBPARTITIONS::
    PRUNED: 1
    ANALYZED: 1 UNANALYZED: 0
    Partition [4]
    #Rows: 1516 #Blks: 60 AvgRowLen: 59.00 ChainCnt: 0.00
    #Rows: 1516 #Blks: 60 AvgRowLen: 59.00 ChainCnt: 0.00


    Notice how it says "SUBPARTITIONS" and uses different rows / blocks in the second case, and shows the partition level cardinality and blocks / 3 like in your example when having just partition level stats.

    Not sure why you don't see the subpartitions statistics used in your case, for me it works as described.

    Randolf

    ReplyDelete
  11. Many thanks Randolf, I was able to get subpartition stats to work. However this is a bigger changes since it involves a change in our stats gathering strategy and might have side effects on estimates for other queries that don't prune to a single hash subpartition, as you have rightly pointed out.

    ReplyDelete
  12. Hi Aveek,

    I suggest that you open an SR with Oracle regarding the incorrect division of the partition level stats when pruning to a single hash subpartition, that clearly looks like a bug.

    Instead of gathering subpartition statistics you could evaluate if using dynamic sampling together with the DYNAMIC_SAMPLING_EST_CDN hint is applicable in your case - it should give you more better cost estimates, but might have other side effects on the execution plans you need to test.

    Randolf

    ReplyDelete