Thursday, February 19, 2015

12c Parallel Execution New Features: Hybrid Hash Distribution - Part 2

In the second part of this post (go to part 1) I want to focus on the hybrid distribution for skewed join expressions.

2. Hybrid Distribution For Skewed Join Expressions


The HYBRID HASH distribution allows to some degree addressing data distribution skew in case of HASH distributions, which I've described in detail already in the past. A summary post that links to all other relevant articles regarding Parallel Execution Skew can be found here, an overview of the relevant feature can be found here and a detailed description can be found here.

One other side effect of the truly hybrid distribution in case of skew (mixture of BROADCAST / HASH for one row source and ROUND-ROBIN / HASH for the other row source) is that HASH distributions following such a hybrid distribution need to redistribute again even if the same join / distribution keys get used by following joins. If this were regular HASH distributions the data would already be suitably distributed and no further redistribution would be required.

Here's an example of this, using the test case setup mentioned here:
-- Here the HYBRID SKEW distribution works for B->C
-- But the (B->C)->A join is affected by the same skew
-- So the HASH re-distribution of the resulting B.ID is skewed, too
-- And hence the HASH JOIN/SORT AGGREGATE (operation 4+5) are affected by the skew
-- The big question is: Why is there a re-distribution (operation 12+11)?
-- The data is already distributed on B.ID??
-- If there wasn't a re-distribution no skew would happen
-- In 11.2 no-redistribution happens no matter if C is probe or hash row source
-- So it looks like a side-effect of the hybrid distribution
-- Which makes sense as it is not really HASH distributed, but hybrid
select count(t_2_filler) from (
select  /*+ monitor
            leading(b c a)
            use_hash(c a)
            swap_join_inputs(a)
            no_swap_join_inputs(c)
            pq_distribute(a hash hash)
            pq_distribute(c hash hash)
            --optimizer_features_enable('11.2.0.4')
            pq_skew(c) 
        */
        a.id as t_1_id
      , a.filler as t_1_filler
      , c.id as t_2_id
      , c.filler as t_2_filler
from    t_1 a
      , t_1 b
      , t_2 c
where
        c.fk_id_skew = b.id
and     a.id = b.id
);

-- 11.2 plan
----------------------------------------------------------------------------
| Id  | Operation                  | Name     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |        |      |            |
|   1 |  SORT AGGREGATE            |          |        |      |            |
|   2 |   PX COORDINATOR           |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)     | :TQ10003 |  Q1,03 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE         |          |  Q1,03 | PCWP |            |
|*  5 |      HASH JOIN             |          |  Q1,03 | PCWP |            |
|   6 |       PX RECEIVE           |          |  Q1,03 | PCWP |            |
|   7 |        PX SEND HASH        | :TQ10000 |  Q1,00 | P->P | HASH       |
|   8 |         PX BLOCK ITERATOR  |          |  Q1,00 | PCWC |            |
|   9 |          TABLE ACCESS FULL | T_1      |  Q1,00 | PCWP |            |
|* 10 |        HASH JOIN           |          |  Q1,03 | PCWP |            |
|  11 |        PX RECEIVE          |          |  Q1,03 | PCWP |            |
|  12 |         PX SEND HASH       | :TQ10001 |  Q1,01 | P->P | HASH       |
|  13 |          PX BLOCK ITERATOR |          |  Q1,01 | PCWC |            |
|  14 |           TABLE ACCESS FULL| T_1      |  Q1,01 | PCWP |            |
|  15 |        PX RECEIVE          |          |  Q1,03 | PCWP |            |
|  16 |         PX SEND HASH       | :TQ10002 |  Q1,02 | P->P | HASH       |
|  17 |          PX BLOCK ITERATOR |          |  Q1,02 | PCWC |            |
|  18 |           TABLE ACCESS FULL| T_2      |  Q1,02 | PCWP |            |
----------------------------------------------------------------------------

-- 12.1 plan
-------------------------------------------------------------------------------------
| Id  | Operation                           | Name     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |        |      |            |
|   1 |  SORT AGGREGATE                     |          |        |      |            |
|   2 |   PX COORDINATOR                    |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)              | :TQ10004 |  Q1,04 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                  |          |  Q1,04 | PCWP |            |
|*  5 |      HASH JOIN                      |          |  Q1,04 | PCWP |            |
|   6 |       PX RECEIVE                    |          |  Q1,04 | PCWP |            |
|   7 |        PX SEND HYBRID HASH          | :TQ10002 |  Q1,02 | P->P | HYBRID HASH|
|   8 |         STATISTICS COLLECTOR        |          |  Q1,02 | PCWC |            |
|   9 |          PX BLOCK ITERATOR          |          |  Q1,02 | PCWC |            |
|  10 |           TABLE ACCESS FULL         | T_1      |  Q1,02 | PCWP |            |
|  11 |       PX RECEIVE                    |          |  Q1,04 | PCWP |            |
|  12 |        PX SEND HYBRID HASH          | :TQ10003 |  Q1,03 | P->P | HYBRID HASH|
|* 13 |         HASH JOIN BUFFERED          |          |  Q1,03 | PCWP |            |
|  14 |          PX RECEIVE                 |          |  Q1,03 | PCWP |            |
|  15 |           PX SEND HYBRID HASH       | :TQ10000 |  Q1,00 | P->P | HYBRID HASH|
|  16 |            STATISTICS COLLECTOR     |          |  Q1,00 | PCWC |            |
|  17 |             PX BLOCK ITERATOR       |          |  Q1,00 | PCWC |            |
|  18 |              TABLE ACCESS FULL      | T_1      |  Q1,00 | PCWP |            |
|  19 |          PX RECEIVE                 |          |  Q1,03 | PCWP |            |
|  20 |           PX SEND HYBRID HASH (SKEW)| :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  21 |            PX BLOCK ITERATOR        |          |  Q1,01 | PCWC |            |
|  22 |             TABLE ACCESS FULL       | T_2      |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------------------
Note that both joins to A and C are based on B.ID. As you can see from the 11.2 plan therefore the final hash join (operation ID 5) doesn't need to have the output of the previous hash join (operation ID 10) redistributed, since the data is already distributed in a suitable way (and as a consequence both joins therefore will be affected by skewed values in T2.FK_ID_SKEW, but no BUFFERED join variant is required).

Now look at the 12c plan when SKEW is detected: Since the SKEW handling in fact leads to a potential mixture of HASH / BROADCAST and HASH / ROUND-ROBIN distribution, the data gets redistributed again for the final join (operation ID 11 + 12) which has several bad side effects: First it adds the overhead of an additional redistribution, as a side effect this then turns one of the hash joins into its BUFFERED variant, and since the SKEW distribution (at present) is only supported if the right side of the join is a table (and not the result of another join), this following join actually will be affected by the skew that was just addressed by the special SKEW handling in the join before (assuming the HYBRID HASH distributions in operation ID 6+7 / 11+12 operate in HASH / HASH, not BROADCAST / ROUND-ROBIN mode)...

No comments:

Post a Comment