Sunday, June 14, 2015

12c Parallel Execution New Features: PX SELECTOR

Continuing my series on new 12c Parallel Execution features: I've already mentioned the new PX SELECTOR operator as part of the new Concurrent UNION ALL feature where it plays a key role. However, in general starting from 12c this new operator usually will get used when it comes to executing a serial part of the execution plan, like a full scan of an object not marked parallel, or an index based operation that can't be parallelized.

In pre-12c such serial parts get executed by the Query Coordinator itself, and the new PX SELECTOR changes that so that one of the PX slaves of a PX slave set is selected to execute that serial part.

There is not much left to say about that functionality, except that it doesn't get used always - there are still plan shapes possible in 12c, depending on the SQL constructs used and combined, that show the pre-12c plan shape where the Query Coordinator executes the serial part.

Let's have a look at a simple example to see in more detail what difference the new operator makes to the overall plan shape and runtime behaviour:
create table t1 as select * from dba_objects;

exec dbms_stats.gather_table_stats(null, 't1')

alter table t1 parallel;

create table t2 as select * from dba_objects;

exec dbms_stats.gather_table_stats(null, 't2')

create index t2_idx on t2 (object_name);

select /*+ optimizer_features_enable('11.2.0.4') */ 
        * 
from 
        t1
      , t2 
where 
        t1.object_id = t2.object_id 
and     t2.object_name like 'BLUB%'
;

-- 11.2.0.4 plan shape
-----------------------------------------------------------------------------------
| Id  | Operation                         | Name     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |        |      |            |
|   1 |  PX COORDINATOR                   |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN                      |          |  Q1,01 | PCWP |            |
|   4 |     BUFFER SORT                   |          |  Q1,01 | PCWC |            |
|   5 |      PX RECEIVE                   |          |  Q1,01 | PCWP |            |
|   6 |       PX SEND BROADCAST           | :TQ10000 |        | S->P | BROADCAST  |
|   7 |        TABLE ACCESS BY INDEX ROWID| T2       |        |      |            |
|*  8 |         INDEX RANGE SCAN          | T2_IDX   |        |      |            |
|   9 |     PX BLOCK ITERATOR             |          |  Q1,01 | PCWC |            |
|* 10 |      TABLE ACCESS FULL            | T1       |  Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   8 - access("T2"."OBJECT_NAME" LIKE 'BLUB%')
       filter("T2"."OBJECT_NAME" LIKE 'BLUB%')
  10 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"T1"."OBJECT_ID"))

-- 12.1.0.2 plan shape
--------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |          |        |      |            |
|   1 |  PX COORDINATOR                            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)                      | :TQ10001 |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN                               |          |  Q1,01 | PCWP |            |
|   4 |     JOIN FILTER CREATE                     | :BF0000  |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE                            |          |  Q1,01 | PCWP |            |
|   6 |       PX SEND BROADCAST                    | :TQ10000 |  Q1,00 | S->P | BROADCAST  |
|   7 |        PX SELECTOR                         |          |  Q1,00 | SCWC |            |
|   8 |         TABLE ACCESS BY INDEX ROWID BATCHED| T2       |  Q1,00 | SCWC |            |
|*  9 |          INDEX RANGE SCAN                  | T2_IDX   |  Q1,00 | SCWP |            |
|  10 |     JOIN FILTER USE                        | :BF0000  |  Q1,01 | PCWP |            |
|  11 |      PX BLOCK ITERATOR                     |          |  Q1,01 | PCWC |            |
|* 12 |       TABLE ACCESS FULL                    | T1       |  Q1,01 | PCWP |            |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   9 - access("T2"."OBJECT_NAME" LIKE 'BLUB%')
       filter("T2"."OBJECT_NAME" LIKE 'BLUB%')
  12 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"T1"."OBJECT_ID"))
The pre-12c plan shape here shows two significant things that I want to emphasize:

First this plan shape only requires a single PX slave set since the Query Coordinator takes over the part that needs to be re-distributed, so although we have a plan shape that requires re-distribution there's only a single PX slave set involved. In case there is at least one operation that gets executed in parallel and requires re-distribution there always will be two PX slave sets.

Second the plan shape demonstrates that parts of a Parallel Execution plan that get executed serially by the Query Coordinator require an additional BUFFER SORT operation. The HASH JOIN operation itself is blocking while it is consuming the left row source for building the hash table, so there is no true requirement to add another BUFFER SORT after the PX RECEIVE operation, but it looks like a pretty strict rule that any serial activity that involves the Query Coordinator adda a BUFFER SORT operation after re-distribution - I assume the reasoning for this is that the Query Coordinator isn't available for "coordinating" the PX slaves as along as it is actively involved in executing serial operations, hence the need to block any other parallel activity.

This normally shouldn't be too relevant to performance since you should only execute operations serially that are tiny and not worth to run parallel, so buffering them shouldn't add much overhead, but it's just another reason why you see additional BUFFER SORT operations in parallel plans that are not there in serial-only plans.

The 12c plan shape shows the new PX SELECTOR operator that executes now the serial part of the execution plan instead of the Query Coordinator. This also adds new decorators in the IN-OUT column called "SCWC" and "SCWP" respectivley, which you won't find in pre-12c plans - they are probably meant to read "Serial Combined With Child/Parent", similar to "PCWC/PCWP".

The good thing about the new PX SELECTOR is that the need for an additional BUFFER SORT operator is now gone.

However, one side-effect of the new operator for this particular plan shape here is that now a second PX slave set is allocated, although only one PX slave actually will get used at runtime. Note that for other plan shapes that need two PX slave sets anyway this doesn't matter.

Another good thing about the new PX SELECTOR operator is that it avoids an odd bug that sometimes happens with Serial->Parallel redistributions when the Query Coordinator is involved. This bug causes some delay to the overall execution that usually isn't too relevant since it only adds approx 1-2 seconds delay (but it can occur several times per execution so these seconds can add up) and therefore is rarely noticed when a Parallel Execution might take several seconds / minutes typically. I might cover this bug in a separate blog post.

Unrelated to the PX SELECTOR operator, the 12c plan shape also demonstrates that in 12c the way Bloom filters are shown in the plan has been improved. The 11.2.0.4 version includes the same Bloom filter as you can see from the "Predicate Information" section of the plan but doesn't make it that obvious from the plan shape that it is there (and sometimes in pre-12c it even doesn't show up in the "Predicate Information" section but is still used)

4 comments:

  1. Hi Randolph,

    About the last comment of the "missing" BF, I think that has been fixed by 12569316 (internal and not backported)
    I stumbled into the "where is the BF?!?" a few days so I went for tracking it down.
    I guess you already knew but I thought of sharing just in case :-)

    Cheers,
    Mauro

    ReplyDelete
  2. Hi Mauro,

    thanks for your comment - I read about that in your post about missing Bloom Filters :-)

    Randolf

    ReplyDelete
  3. Hi Randolf

    I have a question if you could help. We recently upgraded from 11.2 to 18c and our reporting queries are running longer. I found that there is uneven distribution between the parallel slaves. One slave does all the work.
    I believe that has something to do with the PX new features.

    I don't want to set OFE to 11.2.0.4 because covers 100's of parameters and fix controls.

    What are the list of parameters on 18c that I can turn off - to get the 11.2 like behavior ( only the px related parameters )

    --
    Best Regards
    Vishal

    ReplyDelete
  4. Hi Vishal,

    apologies for the delayed reply - I didn't see you comment and only found it today.

    I wouldn't recommend modifying global settings in that case. The first step is determining why you see the difference in behaviour. It could just be a side effect of different execution plans due to changes in estimates.

    If you say you see uneven distribution between parallel slaves, how did you determine that? You could use my XPLAN_ASH script to visualize the uneven distribution which would allow gaining more understanding.

    Depending on the outcome you can then find out what is the most sensible measure.

    Best regards,
    Randolf

    ReplyDelete