In this blog post I want to cover some aspects of the the new HYBRID HASH adaptive distribution method that I haven't covered yet in my other posts.
As far as I know it serves two purposes for parallel HASH and MERGE JOINs,
adaptive broadcast distribution and
hybrid distribution for
skewed join expressions. In the first part of this post I want to focus on former one (goto
part 2).
1. Adaptive Broadcast Distribution For Small Left Row Sources
It allows the PX SEND / RECEIVE operation for the left (smaller estimated row source) of the hash join to decide dynamically at runtime, actually at each execution, if it should use either a
BROADCAST or
HASH distribution, and correspondingly for the other row source to use then either a
ROUND-ROBIN or a
HASH distribution, too. This is described for example in the corresponding white paper by Maria Colgan
here.
It's important to emphasize that this decision is really done
at each execution of the same cursor, so the same cursor can do a BROADCAST distribution for the left row source at one execution and HASH distribution at another execution depending on whether the number of rows detected by the STATISTICS COLLECTOR operator exceeds the threshold or not. This is
different from the behaviour of
"adaptive joins" where the
final plan will be
resolved at first execution and from then on will be re-used, and therefore a STATISTICS COLLECTOR operator as part of an adaptive plan no longer will be evaluated after the first execution.
Here is a simple script demonstrating that the distribution method is evaluated at each execution:
define dop = 4
create table t_1
compress
as
select
rownum as id
, rpad('x', 100) as filler
from
(select /*+ cardinality(&dop*2) */ * from dual
connect by
level <= &dop*2) a
;
exec dbms_stats.gather_table_stats(null, 't_1', method_opt=>'for all columns size 1')
create table t_2
compress
as
select
rownum as id
, mod(rownum, &dop) + 1 as fk_id
, rpad('x', 100) as filler
from
(select /*+ cardinality(1e5) */ * from dual
connect by
level <= 1e5) a
;
exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1')
alter table t_1 parallel &dop cache;
alter table t_2 parallel &dop cache;
select /*+ leading(t1) no_swap_join_inputs(t2) pq_distribute(t_2 hash hash) */ max(t_2.id) from t_1, t_2 where t_1.id = t_2.fk_id;
@pqstat
delete from t_1 where rownum <= 1;
select count(*) from t_1;
select /*+ leading(t1) no_swap_join_inputs(t2) pq_distribute(t_2 hash hash) */ max(t_2.id) from t_1, t_2 where t_1.id = t_2.fk_id;
@pqstat
rollback;
For the table queue 0 (the distribution of T_1) the distribution for the first execution in above script look like this:
TQ_ID SERVER_TYP INSTANCE PROCESS NUM_ROWS % GRAPH
---------- ---------- ---------- -------- ---------- ---------- ----------
0 Producer 1 P004 8 100 ##########
P005 0 0
P006 0 0
P007 0 0
********** ********** ----------
Total 8
Consumer 1 P000 3 38 ##########
P001 1 13 ###
P002 2 25 #######
P003 2 25 #######
********** ********** ----------
Total 8
So the eight rows are distributed assumingly by hash. But for the second execution with only seven rows in T_1 I get this output:
TQ_ID SERVER_TYP INSTANCE PROCESS NUM_ROWS % GRAPH
---------- ---------- ---------- -------- ---------- ---------- ----------
0 Producer 1 P004 28 100 ##########
P005 0 0
P006 0 0
P007 0 0
********** ********** ----------
Total 28
Consumer 1 P000 7 25 ##########
P001 7 25 ##########
P002 7 25 ##########
P003 7 25 ##########
********** ********** ----------
Total 28
So the seven rows were this time broadcasted.
The
"pqstat" script is simply a query on V$PQ_TQSTAT, which I've mentioned for example
here.
So I run the same query twice, the first time the threshold is exceeded and a HASH distribution takes place. After deleting one row the second execution of the same cursor turns into a BROADCAST / ROUND-ROBIN distribution. You can verify that this is the same parent / child cursor via DBMS_XPLAN.DISPLAY_CURSOR / V$SQL. Real-Time SQL Monitoring also can provide more details about the distribution methods used (click on the "binoculars" icon in the "Other" column of the active report for the PX SEND HYBRID HASH operations).
Note that the
dynamic switch between HASH to BROADCAST unfortunately isn't the same as a decision of the optimizer at parse time to use BROADCAST distribution, because in such a case the other row source
won't be distributed at all, which comes with some important side effects:
Not only the redistribution of larger row sources simply can take significant time and resources (CPU and in case of RAC network), but due to the (in 12c still existing) limitation of Parallel Execution that only a single redistribution is allowed to be active concurrently reducing the number of redistributions in the plan simply as a side effect can
reduce the number of
BUFFERED operations (mostly HASH JOIN BUFFERED, but could be additional BUFFER SORTs, too), which are a threat to Parallel Execution performance in general.
Here is a very simple example showing the difference:
-- HYBRID HASH with possible BROADCAST distribution of T_1
----------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | Q1,02 | P->S | QC (RAND) |
|* 3 | HASH JOIN BUFFERED | | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | Q1,02 | PCWP | |
| 5 | PX SEND HYBRID HASH | :TQ10000 | Q1,00 | P->P | HYBRID HASH|
| 6 | STATISTICS COLLECTOR | | Q1,00 | PCWC | |
| 7 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 8 | TABLE ACCESS FULL | T_1 | Q1,00 | PCWP | |
| 9 | PX RECEIVE | | Q1,02 | PCWP | |
| 10 | PX SEND HYBRID HASH | :TQ10001 | Q1,01 | P->P | HYBRID HASH|
| 11 | PX BLOCK ITERATOR | | Q1,01 | PCWC | |
| 12 | TABLE ACCESS FULL | T_2 | Q1,01 | PCWP | |
----------------------------------------------------------------------------
-- TRUE BROADCAST of T_1
-------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | Q1,01 | P->S | QC (RAND) |
|* 3 | HASH JOIN | | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | Q1,01 | PCWP | |
| 5 | PX SEND BROADCAST | :TQ10000 | Q1,00 | P->P | BROADCAST |
| 6 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 7 | TABLE ACCESS FULL| T_1 | Q1,00 | PCWP | |
| 8 | PX BLOCK ITERATOR | | Q1,01 | PCWC | |
| 9 | TABLE ACCESS FULL | T_2 | Q1,01 | PCWP | |
-------------------------------------------------------------------------
So even if in the first plan the T_1 row source really has less than
2*DOP rows and the HYBRID HASH distribution turns into a BROADCAST distribution, this doesn't change the overall plan shape generated by the optimizer. The second HYBRID HASH distribution won't be skipped and will turn into a ROUND-ROBIN distribution instead, which can be confirmed by looking at the output from V$PQ_TQSTAT for example. So the data of the second row source still needs to be distributed, and hence the HASH JOIN will be operating as
BUFFERED join due to the plan shape and the limitation that only a single PX SEND / RECEIVE pair can be active at the same time.
In the second plan the BROADCAST distribution of T_1 means that T_2 will not be re-distributed, hence there is no need to operate the HASH JOIN in buffered mode.
So the only purpose of this particular adaptive HYBRID HASH distribution is obviously to
avoid skew if there are only a couple of rows (and hence possible join key values) in the left row source, because a HASH distribution based on such a low number of distinct values doesn't work well. Oracle's algorithm needs a certain number of distinct values otherwise it can end up with a bad distribution. This probably also explains why the threshold of 2*DOP was chosen so low.
Hi Randolf,
ReplyDeleteThe current HYBRID HASH implementation is designed for cases where the left side cardinality is overestimated. So. it is a replacement for the old HASH-HASH distribution. The cases where the cardinality is underestimated resulting in a BROADCAST distribution on the left side remain the same and they do not get HYBRID HASH.
Hi Yasin,
ReplyDeletethanks for the comment (Yasin is the product manager for Parallel Execution at Oracle).
My point here is that if the HYBRID HASH distribution was designed for cases with an overestimate in mind, why was the threshold then chosen so low (2*DOP rows)?
The optimizer will come up with BROADCAST distributions for row sources much larger than 2*DOP rows (of course depending on the size of the other row source).
So I see the the usefulness of this feature as quite limited at present - and as I've outlined, it suffers from the fact that it doesn't avoid the re-distribution of the right side, with the side effects mentioned in the post.
Randolf
You are right about the limitations of the current implementation. Both of your concerns, the decision point to switch distribution methods and the re-distribution of the right side are being addressed. Stay tuned...
ReplyDeleteHi Yasin,
ReplyDeleteok, thanks, so I'm looking forward to seeing those improvements :-)
Randolf
So I am wondering - has this been addressed and how?
ReplyDelete