In the second part of this post (go to
part 1) I want to focus on the
hybrid distribution for
skewed join expressions.
2. Hybrid Distribution For Skewed Join Expressions
The HYBRID HASH distribution allows to some degree addressing data distribution skew in case of HASH distributions, which I've described in detail already in the past. A summary post that links to all other relevant articles regarding Parallel Execution Skew can be found
here, an overview of the relevant feature can be found
here and a detailed description can be found
here.
One other side effect of the truly hybrid distribution in case of skew (mixture of
BROADCAST / HASH for one row source and
ROUND-ROBIN / HASH for the other row source) is that HASH distributions
following such a hybrid distribution need to
redistribute again even if the
same join / distribution keys get used by following joins. If this were regular HASH distributions the data would already be suitably distributed and no further redistribution would be required.
Here's an example of this, using the test case setup mentioned
here:
-- Here the HYBRID SKEW distribution works for B->C
-- But the (B->C)->A join is affected by the same skew
-- So the HASH re-distribution of the resulting B.ID is skewed, too
-- And hence the HASH JOIN/SORT AGGREGATE (operation 4+5) are affected by the skew
-- The big question is: Why is there a re-distribution (operation 12+11)?
-- The data is already distributed on B.ID??
-- If there wasn't a re-distribution no skew would happen
-- In 11.2 no-redistribution happens no matter if C is probe or hash row source
-- So it looks like a side-effect of the hybrid distribution
-- Which makes sense as it is not really HASH distributed, but hybrid
select count(t_2_filler) from (
select /*+ monitor
leading(b c a)
use_hash(c a)
swap_join_inputs(a)
no_swap_join_inputs(c)
pq_distribute(a hash hash)
pq_distribute(c hash hash)
--optimizer_features_enable('11.2.0.4')
pq_skew(c)
*/
a.id as t_1_id
, a.filler as t_1_filler
, c.id as t_2_id
, c.filler as t_2_filler
from t_1 a
, t_1 b
, t_2 c
where
c.fk_id_skew = b.id
and a.id = b.id
);
-- 11.2 plan
----------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | SORT AGGREGATE | | | | |
| 2 | PX COORDINATOR | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10003 | Q1,03 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | Q1,03 | PCWP | |
|* 5 | HASH JOIN | | Q1,03 | PCWP | |
| 6 | PX RECEIVE | | Q1,03 | PCWP | |
| 7 | PX SEND HASH | :TQ10000 | Q1,00 | P->P | HASH |
| 8 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 9 | TABLE ACCESS FULL | T_1 | Q1,00 | PCWP | |
|* 10 | HASH JOIN | | Q1,03 | PCWP | |
| 11 | PX RECEIVE | | Q1,03 | PCWP | |
| 12 | PX SEND HASH | :TQ10001 | Q1,01 | P->P | HASH |
| 13 | PX BLOCK ITERATOR | | Q1,01 | PCWC | |
| 14 | TABLE ACCESS FULL| T_1 | Q1,01 | PCWP | |
| 15 | PX RECEIVE | | Q1,03 | PCWP | |
| 16 | PX SEND HASH | :TQ10002 | Q1,02 | P->P | HASH |
| 17 | PX BLOCK ITERATOR | | Q1,02 | PCWC | |
| 18 | TABLE ACCESS FULL| T_2 | Q1,02 | PCWP | |
----------------------------------------------------------------------------
-- 12.1 plan
-------------------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | SORT AGGREGATE | | | | |
| 2 | PX COORDINATOR | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10004 | Q1,04 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | Q1,04 | PCWP | |
|* 5 | HASH JOIN | | Q1,04 | PCWP | |
| 6 | PX RECEIVE | | Q1,04 | PCWP | |
| 7 | PX SEND HYBRID HASH | :TQ10002 | Q1,02 | P->P | HYBRID HASH|
| 8 | STATISTICS COLLECTOR | | Q1,02 | PCWC | |
| 9 | PX BLOCK ITERATOR | | Q1,02 | PCWC | |
| 10 | TABLE ACCESS FULL | T_1 | Q1,02 | PCWP | |
| 11 | PX RECEIVE | | Q1,04 | PCWP | |
| 12 | PX SEND HYBRID HASH | :TQ10003 | Q1,03 | P->P | HYBRID HASH|
|* 13 | HASH JOIN BUFFERED | | Q1,03 | PCWP | |
| 14 | PX RECEIVE | | Q1,03 | PCWP | |
| 15 | PX SEND HYBRID HASH | :TQ10000 | Q1,00 | P->P | HYBRID HASH|
| 16 | STATISTICS COLLECTOR | | Q1,00 | PCWC | |
| 17 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 18 | TABLE ACCESS FULL | T_1 | Q1,00 | PCWP | |
| 19 | PX RECEIVE | | Q1,03 | PCWP | |
| 20 | PX SEND HYBRID HASH (SKEW)| :TQ10001 | Q1,01 | P->P | HYBRID HASH|
| 21 | PX BLOCK ITERATOR | | Q1,01 | PCWC | |
| 22 | TABLE ACCESS FULL | T_2 | Q1,01 | PCWP | |
-------------------------------------------------------------------------------------
Note that
both joins to A and C are based on
B.ID. As you can see from the 11.2 plan therefore the final hash join (operation ID 5) doesn't need to have the output of the previous hash join (operation ID 10) redistributed, since the data is already distributed in a suitable way (and as a consequence
both joins therefore will be affected by
skewed values in T2.FK_ID_SKEW, but no
BUFFERED join variant is required).
Now look at the 12c plan when SKEW is detected: Since the SKEW handling in fact leads to a potential mixture of HASH / BROADCAST and HASH / ROUND-ROBIN distribution, the data gets
redistributed again for the final join (operation ID 11 + 12) which has several
bad side effects: First it adds the overhead of an additional redistribution, as a side effect this then turns one of the hash joins into its
BUFFERED variant, and since the SKEW distribution (at present) is only supported if the right side of the join is a table (and not the result of another join), this
following join actually will be affected by the
skew that was just addressed by the special SKEW handling in the join before (assuming the HYBRID HASH distributions in operation ID 6+7 / 11+12 operate in HASH / HASH, not BROADCAST / ROUND-ROBIN mode)...
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.