Friday, October 16, 2015

12c Parallel Execution New Features: Parallel FILTER Subquery Evaluation - Part 2: Distribution Methods

Picking up from the first part of this instalment I'll focus in this post on the available distribution methods for the new parallel FILTER subquery feature.

In this post I won't go into the details how the optimizer selects the distribution method automatically - this will be covered in the last part.

Here I merely describe the different available methods and how to control them using the new PQ_FILTER hint, which is also mentioned in the official documentation, although I find a bit hard to follow the description there.

There are four different options available to the PQ_FILTER hint, and only two of them actually describe a distribution method. One of them tells to not distribute the data at all, and the last one reverts to the former pre-12c plan shape:

SERIAL: This tells the optimizer to use the pre-12c plan shape where the FILTER operator is executed by the Query Coordinator. Depending on the SQL features used and combined, 12c sometimes still reverts to this plan shape and it looks like in those cases you can't force the parallel filter evaluation via the PQ_FILTER hint - the outline then contains a PQ_FILTER hint that is supposed to result in a parallel filter evaluation, but the serial plan shape gets used anyway despite the hint.

NONE: This tells the optimizer to not redistribute the data before executing the FILTER operator driving the subqueries, which means in whatever way the data is distributed the Parallel Execution Servers will execute the filter and the corresponding subqueries for the data currently processed

HASH: Redistribute the data by HASH before running the FILTER operator and the corresponding subqueries.

RANDOM: Redistribute the data by ROUND-ROBIN before running the FILTER operator and the corresponding subqueries. Despite being called RANDOM the distribution method shown in the plan is ROUND-ROBIN rather than RANDOM.

In this post I want to show an example for each of those variants and address the following questions in addition:

1. Where and how has the PQ_FILTER hint to be specified? How has the hint to be used in case of multiple FILTERs with subqueries in the execution plan?

2. In case of a HASH distribution and multiple subqueries what keys get used as input for the HASH function used for distribution?

Let's start with an example for each of the hints used to demonstrate the different plan shapes. For that purpose I'll re-use the same setup and query from the initial part of this instalment, with the index set to invisible:
create table t_1
compress
as
select  /*+ use_nl(a b) */
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_1', method_opt=>'for all columns size 1')

alter table t_1 parallel 4;

create index t_1_idx on t_1 (id) invisible;

explain plan for
select /*+ 
           pq_filter(<DIST_METHOD>)
        */ count(*) from
        t_1 t
        where exists (select /*+ no_unnest */ null from t_1 where t.id = t_1.id);

-- 12.1.0.2 plan shape with index invisible and PQ_FILTER(SERIAL)
------------------------------------------------------------------------
| Id  | Operation              | Name     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |        |      |            |
|   1 |  SORT AGGREGATE        |          |        |      |            |
|*  2 |   FILTER               |          |        |      |            |
|   3 |    PX COORDINATOR      |          |        |      |            |
|   4 |     PX SEND QC (RANDOM)| :TQ20000 |  Q2,00 | P->S | QC (RAND)  |
|   5 |      PX BLOCK ITERATOR |          |  Q2,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| T_1      |  Q2,00 | PCWP |            |
|   7 |    PX COORDINATOR      |          |        |      |            |
|   8 |     PX SEND QC (RANDOM)| :TQ10000 |  Q1,00 | P->S | QC (RAND)  |
|   9 |      PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|* 10 |       TABLE ACCESS FULL| T_1      |  Q1,00 | PCWP |            |
------------------------------------------------------------------------

-- 12.1.0.2 plan shape with index invisible and PQ_FILTER(NONE)
-------------------------------------------------------------------------
| Id  | Operation               | Name     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |        |      |            |
|   1 |  SORT AGGREGATE         |          |        |      |            |
|   2 |   PX COORDINATOR        |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)  | :TQ10000 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE      |          |  Q1,00 | PCWP |            |
|*  5 |      FILTER             |          |  Q1,00 | PCWC |            |
|   6 |       PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| T_1      |  Q1,00 | PCWP |            |
|*  8 |       TABLE ACCESS FULL | T_1      |        |      |            |
-------------------------------------------------------------------------

-- 12.1.0.2 plan shape with index invisible and PQ_FILTER(RANDOM)
---------------------------------------------------------------------------
| Id  | Operation                 | Name     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |        |      |            |
|   1 |  SORT AGGREGATE           |          |        |      |            |
|   2 |   PX COORDINATOR          |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10001 |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |          |  Q1,01 | PCWP |            |
|*  5 |      FILTER               |          |  Q1,01 | PCWP |            |
|   6 |       PX RECEIVE          |          |  Q1,01 | PCWP |            |
|   7 |        PX SEND ROUND-ROBIN| :TQ10000 |  Q1,00 | P->P | RND-ROBIN  |
|   8 |         PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|   9 |          TABLE ACCESS FULL| T_1      |  Q1,00 | PCWP |            |
|* 10 |       TABLE ACCESS FULL   | T_1      |        |      |            |
---------------------------------------------------------------------------

-- 12.1.0.2 plan shape with index invisible and PQ_FILTER(HASH)
---------------------------------------------------------------------------
| Id  | Operation                 | Name     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |        |      |            |
|   1 |  SORT AGGREGATE           |          |        |      |            |
|   2 |   PX COORDINATOR          |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10001 |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |          |  Q1,01 | PCWP |            |
|*  5 |      FILTER               |          |  Q1,01 | PCWP |            |
|   6 |       PX RECEIVE          |          |  Q1,01 | PCWP |            |
|   7 |        PX SEND HASH       | :TQ10000 |  Q1,00 | P->P | HASH       |
|   8 |         PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|   9 |          TABLE ACCESS FULL| T_1      |  Q1,00 | PCWP |            |
|* 10 |       TABLE ACCESS FULL   | T_1      |        |      |            |
---------------------------------------------------------------------------
So we can see with PQ_FILTER(SERIAL) we get the former, pre-12c plan shape, which in this case results in a parallel full table scan in the filter subquery and a decomposition into multiple DFO trees, both things that should be avoided in general.

I've omitted the cost estimates but there seems to be an odd inconsistency: The new parallel FILTER evaluation results in a higher cost estimate than the old serial one, simply due to the fact that the full table scan cost is lowered by the chunked parallel table scan (PX BLOCK ITERATOR) in the old plan shape, whereas the complete full table scans performed in the Parallel Execution Servers is treated as serial full table scan cost-wise. Nevertheless by default the plan with the higher costs is selected by the optimizer in 12c.

Let's address the first question how to specify the hint: As we can see from the simple example here when using the format of the hint mentioned in the official documentation it should be put into the main (driving) query, not into the filter subqueries, which makes kind of sense, since there could be multiple subqueries but only one FILTER operation in the main query driving them.

Let's have a look at a slightly more complicated query:
explain plan for
select /*+
           qb_name(main)
           -- When merging the two query blocks there will be a single FILTER in the main query block
           --pq_filter(@main hash)
           -- Global form of the PQ_FILTER hint
           no_merge(@qb1)
           no_merge(@qb2)
           pq_filter(@qb1 random)
           pq_filter(@qb2 hash)
       */
       *
       from
(
select /*+
           qb_name(qb1)
           -- Alternatively use local form of the hints
           --pq_filter(random)
           --no_merge
        */ *
        from
        t_1 t
        where exists (select /*+ no_unnest */ null from t_1 where t.id = t_1.id)
) a,
(
select /*+
           qb_name(qb2)
           -- Alternatively use local form of the hints
           --pq_filter(hash)
           --no_merge
        */ *
        from t_1 t
        where exists (select /*+ no_unnest */ null from t_1 where t.id = t_1.id)
) b
where b.id = a.id
;

-- Plan shape with the hints as specified
-------------------------------------------------------------------------------
| Id  | Operation                     | Name     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |        |      |            |
|   1 |  PX COORDINATOR               |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)         | :TQ10004 |  Q1,04 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED         |          |  Q1,04 | PCWP |            |
|   4 |     PX RECEIVE                |          |  Q1,04 | PCWP |            |
|   5 |      PX SEND HYBRID HASH      | :TQ10002 |  Q1,02 | P->P | HYBRID HASH|
|   6 |       STATISTICS COLLECTOR    |          |  Q1,02 | PCWC |            |
|   7 |        BUFFER SORT            |          |  Q1,02 | PCWP |            |
|   8 |         VIEW                  |          |  Q1,02 | PCWP |            |
|*  9 |          FILTER               |          |  Q1,02 | PCWP |            |
|  10 |           PX RECEIVE          |          |  Q1,02 | PCWP |            |
|  11 |            PX SEND ROUND-ROBIN| :TQ10000 |  Q1,00 | P->P | RND-ROBIN  |
|  12 |             PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|  13 |              TABLE ACCESS FULL| T_1      |  Q1,00 | PCWP |            |
|* 14 |           TABLE ACCESS FULL   | T_1      |        |      |            |
|  15 |     PX RECEIVE                |          |  Q1,04 | PCWP |            |
|  16 |      PX SEND HYBRID HASH      | :TQ10003 |  Q1,03 | P->P | HYBRID HASH|
|  17 |       BUFFER SORT             |          |  Q1,03 | PCWP |            |
|  18 |        VIEW                   |          |  Q1,03 | PCWP |            |
|* 19 |         FILTER                |          |  Q1,03 | PCWP |            |
|  20 |          PX RECEIVE           |          |  Q1,03 | PCWP |            |
|  21 |           PX SEND HASH        | :TQ10001 |  Q1,01 | P->P | HASH       |
|  22 |            PX BLOCK ITERATOR  |          |  Q1,01 | PCWC |            |
|  23 |             TABLE ACCESS FULL | T_1      |  Q1,01 | PCWP |            |
|* 24 |          TABLE ACCESS FULL    | T_1      |        |      |            |
-------------------------------------------------------------------------------
So in this example here we can see that multiple FILTER operations with subqueries are possible - in principle one per query block. Here I've used the global form of the hint, which includes a query block name to assign the hint to the corresponding FILTER operation. The same plan shape could be achieved by using the local hints within the query blocks. If no NO_MERGE hint gets used, both query blocks will be merged and there will be only a single FILTER operation - which then can be influenced by a corresponding PQ_FILTER hint in the main/outer query block.

Regarding the second question, what keys get used as input for the HASH distribution, we can check the corresponding "Projection information" section of the DBMS_XPLAN output, and we can see there that it is pretty straightforward and as expected: All columns / expressions used in the filter subqueries for correlation will be used as input. Using the variation of above query with merged query blocks and HASH distribution of the single FILTER with two subqueries results in the following:
explain plan for
select /*+
           qb_name(main)
           pq_filter(@main hash)
           --no_merge(@qb1)
           --no_merge(@qb2)
           --pq_filter(@qb1 random)
           --pq_filter(@qb2 hash)
       */
       *
       from
(
select /*+
           qb_name(qb1)
           --pq_filter(random)
           --no_merge
        */ *
        from
        t_1 t
        where exists (select /*+ no_unnest */ null from t_1 where t.id = t_1.id)
) a,
(
select /*+
           qb_name(qb2)
           --pq_filter(hash)
           --no_merge
        */ *
        from t_1 t
        where exists (select /*+ no_unnest */ null from t_1 where t.id = t_1.id)
) b
where b.id = a.id
;

--------------------------------------------------------------------------------
| Id  | Operation                      | Name     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |        |      |            |
|   1 |  PX COORDINATOR                |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)          | :TQ10003 |  Q1,03 | P->S | QC (RAND)  |
|   3 |    BUFFER SORT                 |          |  Q1,03 | PCWP |            |
|*  4 |     FILTER                     |          |  Q1,03 | PCWP |            |
|   5 |      PX RECEIVE                |          |  Q1,03 | PCWP |            |
|   6 |       PX SEND HASH             | :TQ10002 |  Q1,02 | P->P | HASH       |
|*  7 |        HASH JOIN BUFFERED      |          |  Q1,02 | PCWP |            |
|   8 |         PX RECEIVE             |          |  Q1,02 | PCWP |            |
|   9 |          PX SEND HYBRID HASH   | :TQ10000 |  Q1,00 | P->P | HYBRID HASH|
|  10 |           STATISTICS COLLECTOR |          |  Q1,00 | PCWC |            |
|  11 |            PX BLOCK ITERATOR   |          |  Q1,00 | PCWC |            |
|  12 |             TABLE ACCESS FULL  | T_1      |  Q1,00 | PCWP |            |
|  13 |         PX RECEIVE             |          |  Q1,02 | PCWP |            |
|  14 |          PX SEND HYBRID HASH   | :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  15 |           PX BLOCK ITERATOR    |          |  Q1,01 | PCWC |            |
|  16 |            TABLE ACCESS FULL   | T_1      |  Q1,01 | PCWP |            |
|* 17 |      TABLE ACCESS FULL         | T_1      |        |      |            |
|* 18 |      TABLE ACCESS FULL         | T_1      |        |      |            |
--------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
...
   6 - (#keys=2) "T"."ID"[NUMBER,22], "T"."ID"[NUMBER,22], "T"."FILLER"[VARCHAR2,100], "T"."FILLER"[VARCHAR2,100]
...
So both T.IDs from each subquery are used as keys "(#keys=2)" for the hash function.

In the last part of this instalment I'll focus on how the optimizer chooses the distribution method for the parallel filter operation.