Tuesday, February 25, 2020

New Parallel Distribution Method For Direct Path Loads

Starting with version 12c Oracle obviously has introduced another parallel distribution method for direct path loads (applicable to INSERT APPEND and CTAS operations) when dealing with partitioned objects.

As you might already know, starting with version 11.2 Oracle supported a new variation of the PQ_DISTRIBUTE hint allowing more control how data gets distributed for the actual DML load step. In addition to the already documented methods (NONE, RANDOM / RANDOM_LOCAL, PARTITION) there is a new one EQUIPART which obviously only applies to scenarios where both, source and target table are equi partitioned.

In principle it looks like a "full-partition wise load", where the PX partition granule gets used as chunking method and each PX slave reads from the partition to process from source and writes into the corresponding partition of target. Therefore it doesn't require a redistribution of data and uses only a single PX slave set. Depending on the skew (partitions of different data volume) this might not be the best choice, but for massive data loads with evenly sized partitions it might give some advantage over the other distribution methods - the NONE distribution method being the closest, because it doesn't require additional redistribution either - but here all PX slaves read and write from any partition, so potentially there could be more contention.

Of course this new distributed method works only for the special case of equi partitioned source and target tables - and according to my tests only for the simple case of loading from the source table with no further operations like joins etc. involved.

A simple demonstration:

drop table t_part1 purge;
drop table t_part2 purge;

create table t_part1 (id, filler) partition by range (id) (
  partition n10000 values less than (10001),
  partition n20000 values less than (20001),
  partition n30000 values less than (30001),
  partition n40000 values less than (40001),
  partition n50000 values less than (50001),
  partition n60000 values less than (60001),
  partition n70000 values less than (70001),
  partition n80000 values less than (80001),
  partition n90000 values less than (90001),
  partition n100000 values less than (100001)
 )
as
select rownum as id, rpad('x', 200) as filler
from dual
connect by level <= 100000
;

create table t_part2 (id, filler) partition by range (id) (
  partition n10000 values less than (10001),
  partition n20000 values less than (20001),
  partition n30000 values less than (30001),
  partition n40000 values less than (40001),
  partition n50000 values less than (50001),
  partition n60000 values less than (60001),
  partition n70000 values less than (70001),
  partition n80000 values less than (80001),
  partition n90000 values less than (90001),
  partition n100000 values less than (100001)
 )
as
select rownum as id, rpad('x', 200) as filler
from dual
where 1 = 2
;

alter session enable parallel dml;

-- alter session set tracefile_identifier = equipart;

-- alter session set events 'trace [RDBMS.SQL_Optimizer.*] disk=highest';

--explain plan for
insert /*+ append parallel(2) pq_distribute(t_part2 equipart) */ into t_part2 select * from t_part1;

From 12.1.0.2 on the execution plan for the INSERT APPEND operation looks like this:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                   |          |   100K|    19M|   446   (1)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR                    |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10000 |   100K|    19M|   446   (1)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (EQUI-PARTITION) | T_PART2  |       |       |            |          |       |       |  Q1,00 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |          |   100K|    19M|   446   (1)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|   5 |      PX PARTITION RANGE ALL        |          |   100K|    19M|   446   (1)| 00:00:01 |     1 |    10 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL            | T_PART1  |   100K|    19M|   446   (1)| 00:00:01 |     1 |    10 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$1
   6 - SEL$1 / T_PART1@SEL$1
 
Outline Data
-------------
 
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$1" "T_PART1"@"SEL$1")
      FULL(@"INS$1" "T_PART2"@"INS$1")
      PQ_DISTRIBUTE(@"INS$1" "T_PART2"@"INS$1" EQUIPART)
      OUTLINE_LEAF(@"INS$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('18.1.0')
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Oracle doesn't always automatically choose this distribution method. If you want to enforce it (and it is legal) you can use the PQ_DISTRIBUTE(EQUIPART) hint as outlined.