Sunday, August 16, 2015

Parallel Projection

A recent case at a client reminded me of something that isn't really new but not so well known - Oracle by default performs evaluation at the latest possible point in the execution plan.

So if you happen to have expressions in the projection of a simple SQL statement that runs parallel it might be counter-intuitive that by default Oracle won't evaluate the projection in the Parallel Slaves but in the Query Coordinator - even if it was technically possible - because the latest possible point is the SELECT operation with the ID = 0 of the plan, which is always performed by the Query Coordinator.

Of course, if you make use of expressions that can't be evaluated in parallel or aren't implemented for parallel evaluation, then there is no other choice than doing this in the Query Coordinator.

The specific case in question was a generic export functionality that allowed exporting report results to some CSV or Excel like format, and some of these reports had a lot of rows and complex - in that case CPU intensive - expressions in their projection clause.

When looking at the run time profile of such an export query it became obvious that although it was a (very simple) parallel plan, all of the time was spent in the Query Coordinator, effectively turning this at runtime into a serial execution.

This effect can be reproduced very easily:
create table t_1
compress
as
select  /*+ use_nl(a b) */
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_1', method_opt=>'for all columns size 1')

alter table t_1 parallel cache;

-- Run some CPU intensive expressions in the projection 
-- of a simple parallel Full Table Scan
set echo on timing on time on

set autotrace traceonly statistics

set arraysize 500

select
        regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as some_cpu_intensive_exp1
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as some_cpu_intensive_exp2
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') as some_cpu_intensive_exp3
from t_1
;

-- The plan is clearly parallel
--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |  2000K|   192M|   221   (1)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| T_1      |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

-- But the runtime profile looks more serial
-- although the Parallel Slaves get used to run the Full Table Scan
-- All time spent in the operation ID = 0
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Operation            | Name     | Execs | A-Rows| ReadB | ReadReq | Start | Dur(T)| Dur(A)| Time Active Graph    | Parallel Distribution ASH                                                     | Parallel Execution Skew ASH      | Activity Graph ASH          | Top 5 Activity ASH  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     | SELECT STATEMENT     |          |    5  | 2000K |       |         |     3 |   136 |   120 | #################### |   1:sqlplus.exe(120)[2000K],P008(0)[0],P009(0)[0],P00A(0)[0],P00B(0)[0]       | ################################ | @@@@@@@@@@@@@@@@@@@ ( 98%)  | ON CPU(120)         |
|   1 |   0 |  PX COORDINATOR      |          |    5  | 2000K |       |         |   119 |     1 |     1 |                 #    |   1:sqlplus.exe(1)[2000K],P008(0)[0],P009(0)[0],P00A(0)[0],P00B(0)[0]         |                                  |                     ( .8%)  | ON CPU(1)           |
|   2 |   1 |   PX SEND QC (RANDOM)| :TQ10000 |    4  | 2000K |       |         |    66 |    11 |     2 |          ##          |   2:P00B(1)[508K],P00A(1)[490K],P008(0)[505K],P009(0)[497K],sqlplus.exe(0)[0] |                                  |                     (1.6%)  | PX qref latch(2)    |
|   3 |   2 |    PX BLOCK ITERATOR |          |    4  | 2000K |       |         |       |       |       |                      |   0:P00B(0)[508K],P008(0)[505K],P009(0)[497K],P00A(0)[490K],sqlplus.exe(0)[0] |                                  |                             |                     |
|*  4 |   3 |     TABLE ACCESS FULL| T_1      |   52  | 2000K |   23M |     74  |       |       |       |                      |   0:P00B(0)[508K],P008(0)[505K],P009(0)[497K],P00A(0)[490K],sqlplus.exe(0)[0] |                                  |                             |                     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fortunately there is a simple and straightforward way to make use of the Parallel Slaves for evaluation of projection expressions that can be evaluated in parallel - simply add a suitable NO_MERGE hint for the query block that you want the projection to be evaluated for in the Parallel Slaves.

If you don't want to have side effects on the overall plan shape by not merging views you could always wrap the original query in an outer SELECT and not merging the now inner query block. There seems to be a rule that the projection of a view always get evaluated at the VIEW operator, and if we check the execution plan we can see that the VIEW operator is marked parallel:
set echo on timing on time on

set autotrace traceonly statistics

set arraysize 500

select /*+ no_merge(x) */ * from (
select
        regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as some_cpu_intensive_exp1
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as some_cpu_intensive_exp2
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') as some_cpu_intensive_exp3
from t_1
) x
;

-- View operator is marked parallel
-- This is were the projection clause of the VIEW will be evaluated
---------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |  2000K|    11G|   221   (1)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 |  2000K|    11G|   221   (1)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    VIEW               |          |  2000K|    11G|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|   4 |     PX BLOCK ITERATOR |          |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL| T_1      |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------

-- Runtime profile now shows effective usage of Parallel Slaves 
-- for doing the CPU intensive work
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Operation             | Name     | Execs | A-Rows| Start | Dur(T)| Dur(A)| Time Active Graph    | Parallel Distribution ASH                                                         | Parallel Execution Skew ASH| Activity Graph ASH          | Top 5 Activity ASH               |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     | SELECT STATEMENT      |          |    5  | 2000K |       |       |       |                      |   0:sqlplus.exe(0)[2000K],P000(0)[0],P001(0)[0],P002(0)[0],P003(0)[0]             |                            |                             |                                  |
|   1 |   0 |  PX COORDINATOR       |          |    5  | 2000K |    17 |    63 |    10 |    # ##   #    ####  |   1:sqlplus.exe(10)[2000K],P000(0)[0],P001(0)[0],P002(0)[0],P003(0)[0]            | ####                       | *                   (5.6%)  | resmgr:cpu quantum(10)           |
|   2 |   1 |   PX SEND QC (RANDOM) | :TQ10000 |    4  | 2000K |     5 |    61 |    10 | ## #  ## ## ## #     |   3:P002(5)[544K],P001(4)[487K],P000(1)[535K],P003(0)[434K],sqlplus.exe(0)[0]     | #                          |                     (5.6%)  | ON CPU(7),resmgr:cpu quantum(3)  |
|   3 |   2 |    VIEW               |          |    4  | 2000K |     2 |    82 |    69 | #################### |   4:P003(42)[434K],P001(35)[487K],P000(26)[535K],P002(22)[544K],sqlplus.exe(0)[0] | ############               | @@@@@@@@@@@@@@@@@@@ ( 70%)  | ON CPU(125)                      |
|   4 |   3 |     PX BLOCK ITERATOR |          |    4  | 2000K |       |       |       |                      |   0:P002(0)[544K],P000(0)[535K],P001(0)[487K],P003(0)[434K],sqlplus.exe(0)[0]     |                            |                             |                                  |
|*  5 |   4 |      TABLE ACCESS FULL| T_1      |   52  | 2000K |     3 |    78 |    29 | ###### ####### # ### |   4:P000(11)[535K],P002(8)[544K],P001(8)[487K],P003(7)[434K],sqlplus.exe(0)[0]    | ###                        | *****               ( 19%)  | resmgr:cpu quantum(30),ON CPU(4) |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
At runtime the duration of the query now gets reduced significantly and we can see the Parallel Slaves getting used when the VIEW operator gets evaluated. Although the overall CPU time used is similar to the previous example, the duration of the query execution is less since this CPU time is now spent in parallel in the slaves instead in the Query Coordinator.

Summary


By default Oracle performs evaluation at the latest possible point of the execution plan. Sometimes you can improve runtime by actively influencing when the projection will be evaluated by preventing view merging and introducing a VIEW operator that will be used to evaluate the projection clause.

The optimizer so far doesn't seem to incorporate such possibilities in its evaluations of possible plan shapes, so this is something you need to do manually up to and including Oracle 12c (version 12.1.0.2 as of time of writing this).