Monday, February 16, 2015

12c Parallel Execution New Features: Hybrid Hash Distribution - Part 1

In this blog post I want to cover some aspects of the the new HYBRID HASH adaptive distribution method that I haven't covered yet in my other posts.

As far as I know it serves two purposes for parallel HASH and MERGE JOINs, adaptive broadcast distribution and hybrid distribution for skewed join expressions. In the first part of this post I want to focus on former one (goto part 2).

1. Adaptive Broadcast Distribution For Small Left Row Sources


It allows the PX SEND / RECEIVE operation for the left (smaller estimated row source) of the hash join to decide dynamically at runtime, actually at each execution, if it should use either a BROADCAST or HASH distribution, and correspondingly for the other row source to use then either a ROUND-ROBIN or a HASH distribution, too. This is described for example in the corresponding white paper by Maria Colgan here.

It's important to emphasize that this decision is really done at each execution of the same cursor, so the same cursor can do a BROADCAST distribution for the left row source at one execution and HASH distribution at another execution depending on whether the number of rows detected by the STATISTICS COLLECTOR operator exceeds the threshold or not. This is different from the behaviour of "adaptive joins" where the final plan will be resolved at first execution and from then on will be re-used, and therefore a STATISTICS COLLECTOR operator as part of an adaptive plan no longer will be evaluated after the first execution.

Here is a simple script demonstrating that the distribution method is evaluated at each execution:
define dop = 4

create table t_1
compress
as
select
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(&dop*2) */ * from dual
connect by
        level <= &dop*2) a
;

exec dbms_stats.gather_table_stats(null, 't_1', method_opt=>'for all columns size 1')

create table t_2
compress
as
select
        rownum as id
      , mod(rownum, &dop) + 1 as fk_id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a
;

exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1')

alter table t_1 parallel &dop cache;

alter table t_2 parallel &dop cache;

select /*+ leading(t1) no_swap_join_inputs(t2) pq_distribute(t_2 hash hash) */ max(t_2.id) from t_1, t_2 where t_1.id = t_2.fk_id;

@pqstat

delete from t_1 where rownum <= 1;

select count(*) from t_1;

select /*+ leading(t1) no_swap_join_inputs(t2) pq_distribute(t_2 hash hash) */ max(t_2.id) from t_1, t_2 where t_1.id = t_2.fk_id;

@pqstat

rollback;
For the table queue 0 (the distribution of T_1) the distribution for the first execution in above script look like this:
     TQ_ID SERVER_TYP   INSTANCE PROCESS    NUM_ROWS          % GRAPH     
---------- ---------- ---------- -------- ---------- ---------- ----------
         0 Producer            1 P004              8        100 ##########
                                 P005              0          0           
                                 P006              0          0           
                                 P007              0          0           
           ********** **********          ----------
           Total                                   8

           Consumer            1 P000              3         38 ##########
                                 P001              1         13 ###       
                                 P002              2         25 #######   
                                 P003              2         25 #######   
           ********** **********          ----------
           Total                                   8
So the eight rows are distributed assumingly by hash. But for the second execution with only seven rows in T_1 I get this output:
     TQ_ID SERVER_TYP   INSTANCE PROCESS    NUM_ROWS          % GRAPH
---------- ---------- ---------- -------- ---------- ---------- ----------
         0 Producer            1 P004             28        100 ##########
                                 P005              0          0
                                 P006              0          0
                                 P007              0          0
           ********** **********          ----------
           Total                                  28

           Consumer            1 P000              7         25 ##########
                                 P001              7         25 ##########
                                 P002              7         25 ##########
                                 P003              7         25 ##########
           ********** **********          ----------
           Total                                  28
So the seven rows were this time broadcasted.

The "pqstat" script is simply a query on V$PQ_TQSTAT, which I've mentioned for example here.

So I run the same query twice, the first time the threshold is exceeded and a HASH distribution takes place. After deleting one row the second execution of the same cursor turns into a BROADCAST / ROUND-ROBIN distribution. You can verify that this is the same parent / child cursor via DBMS_XPLAN.DISPLAY_CURSOR / V$SQL. Real-Time SQL Monitoring also can provide more details about the distribution methods used (click on the "binoculars" icon in the "Other" column of the active report for the PX SEND HYBRID HASH operations).

Note that the dynamic switch between HASH to BROADCAST unfortunately isn't the same as a decision of the optimizer at parse time to use BROADCAST distribution, because in such a case the other row source won't be distributed at all, which comes with some important side effects:

Not only the redistribution of larger row sources simply can take significant time and resources (CPU and in case of RAC network), but due to the (in 12c still existing) limitation of Parallel Execution that only a single redistribution is allowed to be active concurrently reducing the number of redistributions in the plan simply as a side effect can reduce the number of BUFFERED operations (mostly HASH JOIN BUFFERED, but could be additional BUFFER SORTs, too), which are a threat to Parallel Execution performance in general.

Here is a very simple example showing the difference:

-- HYBRID HASH with possible BROADCAST distribution of T_1
----------------------------------------------------------------------------
| Id  | Operation                  | Name     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |        |      |            |
|   1 |  PX COORDINATOR            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)      | :TQ10002 |  Q1,02 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED      |          |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE             |          |  Q1,02 | PCWP |            |
|   5 |      PX SEND HYBRID HASH   | :TQ10000 |  Q1,00 | P->P | HYBRID HASH|
|   6 |       STATISTICS COLLECTOR |          |  Q1,00 | PCWC |            |
|   7 |        PX BLOCK ITERATOR   |          |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL  | T_1      |  Q1,00 | PCWP |            |
|   9 |     PX RECEIVE             |          |  Q1,02 | PCWP |            |
|  10 |      PX SEND HYBRID HASH   | :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  11 |       PX BLOCK ITERATOR    |          |  Q1,01 | PCWC |            |
|  12 |        TABLE ACCESS FULL   | T_2      |  Q1,01 | PCWP |            |
----------------------------------------------------------------------------

-- TRUE BROADCAST of T_1
-------------------------------------------------------------------------
| Id  | Operation               | Name     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |        |      |            |
|   1 |  PX COORDINATOR         |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001 |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN            |          |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |  Q1,01 | PCWP |            |
|   5 |      PX SEND BROADCAST  | :TQ10000 |  Q1,00 | P->P | BROADCAST  |
|   6 |       PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| T_1      |  Q1,00 | PCWP |            |
|   8 |     PX BLOCK ITERATOR   |          |  Q1,01 | PCWC |            |
|   9 |      TABLE ACCESS FULL  | T_2      |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------
So even if in the first plan the T_1 row source really has less than 2*DOP rows and the HYBRID HASH distribution turns into a BROADCAST distribution, this doesn't change the overall plan shape generated by the optimizer. The second HYBRID HASH distribution won't be skipped and will turn into a ROUND-ROBIN distribution instead, which can be confirmed by looking at the output from V$PQ_TQSTAT for example. So the data of the second row source still needs to be distributed, and hence the HASH JOIN will be operating as BUFFERED join due to the plan shape and the limitation that only a single PX SEND / RECEIVE pair can be active at the same time.

In the second plan the BROADCAST distribution of T_1 means that T_2 will not be re-distributed, hence there is no need to operate the HASH JOIN in buffered mode.

So the only purpose of this particular adaptive HYBRID HASH distribution is obviously to avoid skew if there are only a couple of rows (and hence possible join key values) in the left row source, because a HASH distribution based on such a low number of distinct values doesn't work well. Oracle's algorithm needs a certain number of distinct values otherwise it can end up with a bad distribution. This probably also explains why the threshold of 2*DOP was chosen so low.

5 comments:

  1. Hi Randolf,

    The current HYBRID HASH implementation is designed for cases where the left side cardinality is overestimated. So. it is a replacement for the old HASH-HASH distribution. The cases where the cardinality is underestimated resulting in a BROADCAST distribution on the left side remain the same and they do not get HYBRID HASH.

    ReplyDelete
  2. Hi Yasin,

    thanks for the comment (Yasin is the product manager for Parallel Execution at Oracle).

    My point here is that if the HYBRID HASH distribution was designed for cases with an overestimate in mind, why was the threshold then chosen so low (2*DOP rows)?

    The optimizer will come up with BROADCAST distributions for row sources much larger than 2*DOP rows (of course depending on the size of the other row source).

    So I see the the usefulness of this feature as quite limited at present - and as I've outlined, it suffers from the fact that it doesn't avoid the re-distribution of the right side, with the side effects mentioned in the post.

    Randolf

    ReplyDelete
  3. You are right about the limitations of the current implementation. Both of your concerns, the decision point to switch distribution methods and the re-distribution of the right side are being addressed. Stay tuned...

    ReplyDelete
  4. Hi Yasin,

    ok, thanks, so I'm looking forward to seeing those improvements :-)

    Randolf

    ReplyDelete
  5. So I am wondering - has this been addressed and how?

    ReplyDelete