Tuesday, March 3, 2015

12c Parallel Execution New Features: Concurrent UNION ALL - Part 1

12c introduces the concurrent UNION ALL operator that allows multiple branches below the UNION ALL to become active concurrently - which is a significant change. Before the introduction of this feature Oracle never executed multiple branches of an execution plan concurrently (in terms of Parallel Execution) - the parallelism so far was about executing the same operations of a particular branch of the execution plan by multiple processes / sessions. When we later look at the actual runtime behaviour of the new feature we'll notice that it's actually not that much different from previous behaviour than it sounds here.

By default the feature gets automatically used when there is at least one branch of the UNION ALL making use of Parallel Execution features, like a parallel full table scan for example.

If the UNION ALL operator consists of serial branches only the concurrent execution of those serial branches can be explicitly activated by using the PQ_CONCURRENT_UNION hint. To suppress the feature usage, a corresponding NO_PQ_CONCURRENT_UNION hint can be used. Both hints are officially documented.

The concurrent execution of serial branches in my opinion could be particularly useful to overcome the still existing limitation of parallelism and operations via database links: If you want to copy larger segments of data via database links you can use Parallel Execution on the source to speed up data retrieval and you can use Parallel Execution on the target to speed up the DML process, but the data will always have to go through the query coordinator process on both sides, so a copy process is always limited by the CPU and network bandwidth / latency of a single process. If you need to copy multiple objects of similar size you could simply speed up the copy process via spawning multiple jobs, but if the objects are of significantly different size you would still have to wait for the single process to cope with the largest segment.

Before 12c the only way to overcome this limitation was to employ "do-it-yourself" parallelism per (large) segment, more recently supported via DBMS_PARALLEL_EXECUTE. This allows spreading such an operation across several processes / sessions concurrently and therefore allowing to overcome the described limitation. But this approach makes the overall handling more complex (chunking, error handling etc., although DBMS_PARALLEL_EXECUTE provides a standardized interface for such tasks) and isn't a simple SQL based solution any longer.

With the concurrent UNION ALL feature in principle a SQL solution should be possible where multiple branches of a concurrent UNION ALL query (different chunks) of the same remote segment concurrently. Of course this approach still requires some manual preparation since ideally the different branches should only query disjunct parts of the segment, so for non-partitioned segments ROWID ranges should be prepared and used inside the different remote queries, similar to the DBMS_PARALLEL_EXECUTE approach. If you don't care about the duplicated work you could of course simply use something like a ORA_HASH or MOD based filter on a column expression that comes up with a reasonably equal (hash) value distribution (like a primary key).

However, due to the way the concurrent UNION ALL is implemented internally at present, this approach might not work very well, at least in 12.1.0.1 and 12.1.0.2, as I'll show in a later part.

Some other questions I would like to address regarding this feature are the following:

- How is the work distributed at runtime if you end up with a mixture of parallel and serial branches?

- If you request concurrent execution with serial branches only and no other parallel execution in the plan, how is the parallel degree to be used determined?

In this part of the blog post I want to start with the latter question first: What parallel degree gets used if a UNION ALL consists only of serial branches?

And one follow-up question could be: What happens if a more complex execution plan mixes parts with Parallel Execution outside a UNION ALL with a concurrent UNION ALL that consists only of serial branches?

Here is the table setup I'm going to use for all my tests:
-- This is the Parallel table
create table t_2
compress
as
select
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_2')

alter table t_2 parallel 8;

-- This is the serial table
create table t2
compress
as
select
        (rownum * 2) + 1 as id
      , mod(rownum, 2000) + 1 as id2
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(10000) */ * from dual
connect by
        level <= 10000) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't2')

create database link loop connect to cbo_test identified by cbo_test using '//localhost:1521/orcl12'
The database link will become relevant in the later parts of this series where I look at the execution of remote branches.

So let's start with the question about the parallel degree used in case of serial branches only. I'll use the following simple query to determine that (repeat the query block inside the UNION ALL as many times as desired, I've used 20 serial branches):
-- Can I force a completely serial UNION ALL into concurrent execution using the hint? Yes, but isn't selected by the optimizer automatically
-- In 12.1.0.2 the PQ_CONCURRENT_UNION hint works as documented as statement level hint, whereas in 12.1.0.1 there is the bug that you need to specify a query block name (Bug 15851422 : PQ_CONCURRENT_HINT DOES NOT WORK WITHOUT QUERY BLOCK SPECIFICATION)
-- What about the degree chosen in such a case, is there an upper limit?
select /*+ PQ_CONCURRENT_UNION(@"SET$1") */ count(*) from (
select regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
.
.
.
select regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
);
I've performed all tests on 12.1.0.2.

If you do an EXPLAIN PLAN on above query without the PQ_CONCURRENT_UNION hint, then you'll see, as documented, the default behaviour, which is serial execution of one branch after the other (I've used 20 serial branches in my tests):
-------------------------------------
| Id  | Operation            | Name |
-------------------------------------
|   0 | SELECT STATEMENT     |      |
|   1 |  SORT AGGREGATE      |      |
|   2 |   VIEW               |      |
|   3 |    UNION-ALL         |      |
|*  4 |     TABLE ACCESS FULL| T2   |
|*  5 |     TABLE ACCESS FULL| T2   |
|*  6 |     TABLE ACCESS FULL| T2   |
|*  7 |     TABLE ACCESS FULL| T2   |
|*  8 |     TABLE ACCESS FULL| T2   |
|*  9 |     TABLE ACCESS FULL| T2   |
|* 10 |     TABLE ACCESS FULL| T2   |
|* 11 |     TABLE ACCESS FULL| T2   |
|* 12 |     TABLE ACCESS FULL| T2   |
|* 13 |     TABLE ACCESS FULL| T2   |
|* 14 |     TABLE ACCESS FULL| T2   |
|* 15 |     TABLE ACCESS FULL| T2   |
|* 16 |     TABLE ACCESS FULL| T2   |
|* 17 |     TABLE ACCESS FULL| T2   |
|* 18 |     TABLE ACCESS FULL| T2   |
|* 19 |     TABLE ACCESS FULL| T2   |
|* 20 |     TABLE ACCESS FULL| T2   |
|* 21 |     TABLE ACCESS FULL| T2   |
|* 22 |     TABLE ACCESS FULL| T2   |
|* 23 |     TABLE ACCESS FULL| T2   |
-------------------------------------
With the hint in place the plan changes to this:
-------------------------------------------------------------------------
| Id  | Operation                | Name     |   TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |       |      |            |
|   1 |  SORT AGGREGATE          |          |       |      |            |
|   2 |   PX COORDINATOR         |          |       |      |            |
|   3 |    PX SEND QC (RANDOM)   | :TQ10000 | Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE       |          | Q1,00 | PCWP |            |
|   5 |      VIEW                |          | Q1,00 | PCWP |            |
|   6 |       UNION-ALL          |          | Q1,00 | PCWP |            |
|   7 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|*  8 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|   9 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 10 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  11 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 12 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  13 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 14 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  15 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 16 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  17 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 18 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  19 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 20 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  21 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 22 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  23 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 24 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  25 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 26 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  27 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 28 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  29 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 30 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  31 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 32 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  33 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 34 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  35 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 36 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  37 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 38 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  39 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 40 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  41 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 42 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  43 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 44 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
|  45 |        PX SELECTOR       |          | Q1,00 | PCWP |            |
|* 46 |         TABLE ACCESS FULL| T2       | Q1,00 | PCWP |            |
-------------------------------------------------------------------------
Starting with 12c you'll usually get a plan note about the parallel degree the optimizer has determined, for both the traditional DOP and the in 11.2 introduced Auto DOP way (for Auto DOP this note showed already up in 11.2). However, in this case, no note section shows up.

So what degree do we then get at runtime? (XPLAN_ASH output snippet)
Real-Time SQL Monitoring Execution Summary                                                                      
-----------------------------------------------                                                                 
                                                                                                                
                   |                              |PX IS|PX        |PX        |          |PX        |PX        |
                   |                              |CROSS|MIN       |MAX       |PX        |SERVERS   |SERVERS   |
STATUS             |USERNAME                      |INST |DOP       |DOP       |INSTANCES |REQUESTED |ALLOCATED |
-------------------|------------------------------|-----|----------|----------|----------|----------|----------|
EXECUTING          |CBO_TEST                      |N    |        20|        20|         1|        20|        20|
                   |                              |     |          |          |          |          |          |
So that's interesting, in my case I tested this on a laptop with 2 CPUs and Resource Manager active, which means that insane degrees should be downgraded automatically (or even already limited by the optimizer), but I really got a degree of 20, which doesn't make a lot of sense in this environment. However, in 12c obviously the limits enforced by the Resource Manager have been raised. In 11.2 the highest degree the Resource Manager allowed in this particular environment was 16, in 12c the limit was 27 instead (maybe 28, because in 11.2 I sometimes got 15, sometimes 16).

So pretty straightforward the number of serial branches seem to determine the degree attempted to use at runtime. This also means you need to be careful how many branches you code into the UNION ALL if you want to make use of the concurrent feature.

What happens if I mix now a Parallel Execution with a UNION ALL that consists only of serial branches, for example like this:
select /*+ PQ_CONCURRENT_UNION(@"SET$1") */ count(*) from (
select id, regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t_2
where regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
) a,
(
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
) b
where a.id = b.id;
Again, if I don't use the PQ_CONCURRENT_UNION hint, I'll get the traditional plan shape for the UNION ALL - which gets executed serially and then gets distributed as part of the remaining parallel operations:
-------------------------------------------------------------------------------
| Id  | Operation                     | Name     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |        |      |            |
|   1 |  SORT AGGREGATE               |          |        |      |            |
|   2 |   PX COORDINATOR              |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10002 |  Q1,02 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE            |          |  Q1,02 | PCWP |            |
|*  5 |      HASH JOIN                |          |  Q1,02 | PCWP |            |
|   6 |       BUFFER SORT             |          |  Q1,02 | PCWC |            |
|   7 |        PX RECEIVE             |          |  Q1,02 | PCWP |            |
|   8 |         PX SEND HYBRID HASH   | :TQ10000 |        | S->P | HYBRID HASH|
|   9 |          STATISTICS COLLECTOR |          |        |      |            |
|  10 |           VIEW                |          |        |      |            |
|  11 |            UNION-ALL          |          |        |      |            |
|* 12 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 13 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 14 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 15 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 16 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 17 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 18 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 19 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 20 |             TABLE ACCESS FULL | T2       |        |      |            |
|* 21 |             TABLE ACCESS FULL | T2       |        |      |            |
|  22 |       PX RECEIVE              |          |  Q1,02 | PCWP |            |
|  23 |        PX SEND HYBRID HASH    | :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  24 |         PX BLOCK ITERATOR     |          |  Q1,01 | PCWC |            |
|* 25 |          TABLE ACCESS FULL    | T_2      |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access("ID"="B"."ID")
  12 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  13 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  14 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  15 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  16 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  17 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  18 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  19 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  20 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  21 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  25 - filter( REGEXP_REPLACE ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))

Note
-----
   - Degree of Parallelism is 4 because of table property
Since I marked the table T_2 with PARALLEL 8 I would expect the overall plan to use a degree of 8, but look at the "note" section, which tells me the degree would be 4 due to a table property...

What happens at runtime:
Real-Time SQL Monitoring Execution Summary                                                                      
-----------------------------------------------                                                                 
                                                                                                                
                   |                              |PX IS|PX        |PX        |          |PX        |PX        |
                   |                              |CROSS|MIN       |MAX       |PX        |SERVERS   |SERVERS   |
STATUS             |USERNAME                      |INST |DOP       |DOP       |INSTANCES |REQUESTED |ALLOCATED |
-------------------|------------------------------|-----|----------|----------|----------|----------|----------|
EXECUTING          |CBO_TEST                      |N    |         8|         8|         1|        16|        16|
                   |                              |     |          |          |          |          |          |
But at runtime I get a degree of 8, so you have to be careful with these plan notes as they quite often seem to report incorrect degrees for the traditional DOP way.

If I now use the PQ_CONCURRENT_UNION hint, I get the following plan:
------------------------------------------------------------------------------
| Id  | Operation                    | Name     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |        |      |            |
|   1 |  SORT AGGREGATE              |          |        |      |            |
|   2 |   PX COORDINATOR             |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)       | :TQ10002 |  Q1,02 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE           |          |  Q1,02 | PCWP |            |
|*  5 |      HASH JOIN               |          |  Q1,02 | PCWP |            |
|   6 |       PX RECEIVE             |          |  Q1,02 | PCWP |            |
|   7 |        PX SEND HYBRID HASH   | :TQ10000 |  Q1,00 | P->P | HYBRID HASH|
|   8 |         STATISTICS COLLECTOR |          |  Q1,00 | PCWC |            |
|   9 |          VIEW                |          |  Q1,00 | PCWP |            |
|  10 |           UNION-ALL          |          |  Q1,00 | PCWP |            |
|  11 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 12 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  13 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 14 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  15 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 16 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  17 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 18 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  19 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 20 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  21 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 22 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  23 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 24 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  25 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 26 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  27 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 28 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  29 |            PX SELECTOR       |          |  Q1,00 | PCWP |            |
|* 30 |             TABLE ACCESS FULL| T2       |  Q1,00 | PCWP |            |
|  31 |       PX RECEIVE             |          |  Q1,02 | PCWP |            |
|  32 |        PX SEND HYBRID HASH   | :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  33 |         PX BLOCK ITERATOR    |          |  Q1,01 | PCWC |            |
|* 34 |          TABLE ACCESS FULL   | T_2      |  Q1,01 | PCWP |            |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access("ID"="B"."ID")
  12 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  14 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  16 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  18 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  20 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  22 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  24 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  26 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  28 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  30 - filter( REGEXP_REPLACE ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))
  34 - filter( REGEXP_REPLACE ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'c')>= REGEXP_REPLACE 
              ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$','        \1',1,1,'i'))

Note
-----
   - Degree of Parallelism is 4 because of table property
OK, this is the concurrent UNION ALL shape now, but the "note" section still tells me the parallel degree is 4.

What happens at runtime:
Real-Time SQL Monitoring Execution Summary                                                                      
-----------------------------------------------                                                                 
                                                                                                                
                   |                              |PX IS|PX        |PX        |          |PX        |PX        |
                   |                              |CROSS|MIN       |MAX       |PX        |SERVERS   |SERVERS   |
STATUS             |USERNAME                      |INST |DOP       |DOP       |INSTANCES |REQUESTED |ALLOCATED |
-------------------|------------------------------|-----|----------|----------|----------|----------|----------|
EXECUTING          |CBO_TEST                      |N    |        10|        10|         1|        20|        20|
                   |                              |     |          |          |          |          |          |
Oops, at runtime I now get the degree determined by the PQ_CONCURRENT_UNION part, so the overall degree is then in my case 10, and not 4 (or 8 for that matter), since my UNION ALL had 10 serial branches in this example.

So it becomes obvious that using the new feature with serial branches you have to worry a bit about the parallel degree used at runtime.

In the next instalment we'll look at the actual runtime behaviour of the feature when it gets automatically triggered by a mixture of parallel and serial branches.

4 comments:

Yasin Baskan said...

Hi Randolf,

Regarding your comment about the resource manager limits; DBRM does not limit the DOP if you do not set it explicitly with the parallel_degree_limit_p1 directive. So assuming you are using manual DOP, and have an active resource manager plan with no DOP limit set, you are capped at parallel_max_servers. So, it is normal that you get DOP=20 for your first SQL. If you set a DOP limit in DBRM you will be capped at that limit.

Since the UNION ALL statement is an inline view in your second SQL it is treated as a table and you get DOP=10 (because there are 10 branches). Since the other table's (t_2) DOP is 8, you get the higher DOP which is 10. If you have less than 8 branches in this SQL you get DOP=8. The conventional rule of taking the higher DOP applies.

If you make t_2 a part of the UNION ALL statement you will get DOP=8 regardless of the number of branches, like below.

select /*+ PQ_CONCURRENT_UNION */ count(*) from (
select id, regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t_2
where regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
union all
select id, regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as result from t2
where regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
);

With Auto DOP the number of branches become irrelevant as the DOP is calculated based on the local tables.

Randolf said...

Hi Yasin,

thanks for your comment and explanations.

Regarding the max. DOP and the Resource Manager: As you can read from my post it's not entirely correct that the max. DOP is not constrained without a DOP directive.

I've used the DEFAULT_PLAN and still get the DOP capped if it exceeds certain calculated thresholds.

I just did some further tests on my 2 CPU laptop with 11.2.0.1 and 12.1.0.2.

In 11.2.0.1 with my configuration this absolute cap seems to be at 16 (PARALLEL_MAX_SERVERS = 40, and without Resource Manager active I can get 40 as DOP, so that's not the limiting factor here), with 12.1.0.2 it seems to be 32 (PARALLEL_MAX_SERVERS = 80).

However, the rules seem to be more complex, if I run a statement that requires two PX sets 11.2.0.1 even gives me only a DOP of 4 with 8 PX servers, although I requested DOP = 40.

In 12.1.0.2 the max. DOP with 2 PX Sets is 16 which seems to be consistent, so the max. number of PX servers per execution / DFO tree seems to be limited to 32 on 12.1.0.2 with my configuration and DEFAULT_PLAN active.

So the Resource Manager even without any DOP directive starts limiting excessive DOPs (I got that confirmed some time ago by someone from Oracle, at present can't recall who it was though).

Regarding the other topics you've mentioned: I'll cover them in the upcoming parts of this series.

Randolf

Yasin Baskan said...

Randolf, it is not resource manager limiting the DOP in this case. It is actually parallel adaptive multiuser functionality.

With parallel_adaptive_multi_user=TRUE and a bug in resource manager (which causes the load to be calculated wrong in this case), the adaptive multiuser feature kicks in and caps the DOP based on the load. That bug is also the reason why you sometimes get DOP=15 and sometimes get DOP=16. As a single user and the only one running PQ on this system you need to get up to DOP=parallel_max_servers (for a SQL using a single slave set).

Until the bug is fixed the workaround is to disable resource manager or set parallel_adaptive_multi_user=FALSE. Auto DOP disables this adaptive functionality so this applies only for manual DOP (parallel_degree_policy=MANUAL).

The current recommendation to customers is to set parallel_adaptive_multi_user to FALSE anyway.

Randolf said...

Hi Yasin,

thanks for the update and the clarification. I still wonder who told me this was a feature and not a bug... Anyway setting PARALLEL_ADAPTIVE_MULTI_USER to FALSE is a good idea and should actually be the default sometime in the future.

Randolf