The next part of the video tutorial explaining the XPLAN_ASH Active Session
History functionality continuing the actual walk-through of the script
output.
More parts to follow.
Node-express MLE/JavaScript example
3 days ago
create table t2 compress as select (rownum * 2) + 1 as id , mod(rownum, 2000) + 1 as id2 , rpad('x', 100) as filler from (select /*+ cardinality(100000) */ * from dual connect by level <= 100000) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b ; exec dbms_stats.gather_table_stats(null, 't2') create table t4 compress as select (rownum * 2) + 1 as id , mod(rownum, 2000) + 1 as id2 , rpad('x', 100) as filler from (select /*+ cardinality(100000) */ * from dual connect by level <= 100000) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b ; exec dbms_stats.gather_table_stats(null, 't4') create table t6 compress as select (rownum * 2) + 1 as id , mod(rownum, 2000) + 1 as id2 , rpad('x', 100) as filler from (select /*+ cardinality(100000) */ * from dual connect by level <= 100000) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b ; exec dbms_stats.gather_table_stats(null, 't6') explain plan for select /*+ no_merge(x) */ * from ( select /*+ parallel(t6 4) --optimizer_features_enable('11.2.0.4') */ * from ( select /*+ parallel(t2 4) */ --lag(id) over (order by id) as v1_rn rownum as v1_rn , t2.id as v1_id , t2.filler as v1_filler from t2 ) v1 , ( select /*+ parallel(t4 2) */ --lag(id) over (order by id) as v2_rn rownum as v2_rn , t4.id as v2_id , t4.filler as v2_filler from t4 ) v2 , t6 where v1_id = v2_id and v1_id = t6.id ) x where rownum > 1 ; -- 11.2.0.4 plan shape ---------------------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | 1 | COUNT | | | | | |* 2 | FILTER | | | | | | 3 | PX COORDINATOR | | | | | | 4 | PX SEND QC (RANDOM) | :TQ30002 | Q3,02 | P->S | QC (RAND) | | 5 | VIEW | | Q3,02 | PCWP | | |* 6 | HASH JOIN | | Q3,02 | PCWP | | | 7 | PX RECEIVE | | Q3,02 | PCWP | | | 8 | PX SEND HASH | :TQ30001 | Q3,01 | P->P | HASH | | 9 | PX BLOCK ITERATOR | | Q3,01 | PCWC | | | 10 | TABLE ACCESS FULL | T6 | Q3,01 | PCWP | | | 11 | BUFFER SORT | | Q3,02 | PCWC | | | 12 | PX RECEIVE | | Q3,02 | PCWP | | | 13 | PX SEND HASH | :TQ30000 | | S->P | HASH | |* 14 | HASH JOIN | | | | | | 15 | VIEW | | | | | | 16 | COUNT | | | | | | 17 | PX COORDINATOR | | | | | | 18 | PX SEND QC (RANDOM)| :TQ10000 | Q1,00 | P->S | QC (RAND) | | 19 | PX BLOCK ITERATOR | | Q1,00 | PCWC | | | 20 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | | | 21 | VIEW | | | | | | 22 | COUNT | | | | | | 23 | PX COORDINATOR | | | | | | 24 | PX SEND QC (RANDOM)| :TQ20000 | Q2,00 | P->S | QC (RAND) | | 25 | PX BLOCK ITERATOR | | Q2,00 | PCWC | | | 26 | TABLE ACCESS FULL| T4 | Q2,00 | PCWP | | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(ROWNUM>1) 6 - access("V1_ID"="T6"."ID") 14 - access("V1_ID"="V2_ID") -- 12.1.0.2 plan shape --------------------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | 1 | COUNT | | | | | |* 2 | FILTER | | | | | | 3 | PX COORDINATOR | | | | | | 4 | PX SEND QC (RANDOM) | :TQ10004 | Q1,04 | P->S | QC (RAND) | | 5 | VIEW | | Q1,04 | PCWP | | |* 6 | HASH JOIN BUFFERED | | Q1,04 | PCWP | | | 7 | PX RECEIVE | | Q1,04 | PCWP | | | 8 | PX SEND HASH | :TQ10002 | Q1,02 | P->P | HASH | | 9 | PX BLOCK ITERATOR | | Q1,02 | PCWC | | | 10 | TABLE ACCESS FULL | T6 | Q1,02 | PCWP | | | 11 | PX RECEIVE | | Q1,04 | PCWP | | | 12 | PX SEND HASH | :TQ10003 | Q1,03 | S->P | HASH | |* 13 | HASH JOIN BUFFERED | | Q1,03 | SCWC | | | 14 | VIEW | | Q1,03 | SCWC | | | 15 | COUNT | | Q1,03 | SCWP | | | 16 | PX RECEIVE | | Q1,03 | SCWP | | | 17 | PX SEND 1 SLAVE | :TQ10000 | Q1,00 | P->S | 1 SLAVE | | 18 | PX BLOCK ITERATOR | | Q1,00 | PCWC | | | 19 | TABLE ACCESS FULL| T2 | Q1,00 | PCWP | | | 20 | VIEW | | Q1,03 | SCWC | | | 21 | COUNT | | Q1,03 | SCWP | | | 22 | PX RECEIVE | | Q1,03 | SCWP | | | 23 | PX SEND 1 SLAVE | :TQ10001 | Q1,01 | P->S | 1 SLAVE | | 24 | PX BLOCK ITERATOR | | Q1,01 | PCWC | | | 25 | TABLE ACCESS FULL| T4 | Q1,01 | PCWP | | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(ROWNUM>1) 6 - access("V1_ID"="T6"."ID") 13 - access("V1_ID"="V2_ID")Let's start with the 11.2.0.4 plan shape: We can see from multiple occurrences of the PX COORDINATOR operator, the TQ column and the corresponding TQ entries in the NAME column that this parallel execution plan consists of three DFO trees. When analyzing the actual run time activity it would also become obvious that the different DFO trees actually run at a different DOP and use different PX slave sets. In this particular case here it would even become obvious that two DFO trees are active at the same time - so it's pretty much demonstrating all the undesirable side effects of having multiple DFO trees that I mentioned above. The HASH JOIN between V1 and V2 runs serially, whereas the HASH JOIN between T6 and the join result of V1 and V2 runs parallel. Since we have a Serial->Parallel distribution between these two HASH JOINs, an additional BUFFER SORT operation gets added - as outlined in the PX SELECTOR note. If we now look at the 12.1.0.2 plan shape we notice that the execution plan consists of a single DFO tree, and hence all the side effects I've outlined are gone. The serial evaluation is done by a single PX slave of one set, again indicated by the SCWC/SCWP decorator, similar to the PX SELECTOR operator. However, the plan shape also demonstrates one possible disadvantage of the new feature: Since now everything is part of a single DFO tree we end up with more re-distributions per DFO tree, and hence might have an increased demand for the dreaded BUFFERing to prevent multiple re-distributions being active at the same time - here this can be seen in operation ID 13 - the HASH JOIN executed serially by one PX slave is actually turned into a HASH JOIN BUFFERED (so the worst case for a (not really) "parallel" hash join - executed only by a single PX slave and needs to buffer the right hand row source in addition), and also the second HASH JOIN is turned into its BUFFERED variant. This means compared to the 11.2.0.4 plan shape that contains no HASH JOIN BUFFERED but an additional BUFFER SORT operation, the 12.1.0.2 plan shape with this data pattern actually requires more PGA / TEMP space than the 11.2.0.4 plan shape due to the double buffering now necessary.
create table t1 as select * from dba_objects; exec dbms_stats.gather_table_stats(null, 't1') alter table t1 parallel; create table t2 as select * from dba_objects; exec dbms_stats.gather_table_stats(null, 't2') create index t2_idx on t2 (object_name); select /*+ optimizer_features_enable('11.2.0.4') */ * from t1 , t2 where t1.object_id = t2.object_id and t2.object_name like 'BLUB%' ; -- 11.2.0.4 plan shape ----------------------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | Q1,01 | P->S | QC (RAND) | |* 3 | HASH JOIN | | Q1,01 | PCWP | | | 4 | BUFFER SORT | | Q1,01 | PCWC | | | 5 | PX RECEIVE | | Q1,01 | PCWP | | | 6 | PX SEND BROADCAST | :TQ10000 | | S->P | BROADCAST | | 7 | TABLE ACCESS BY INDEX ROWID| T2 | | | | |* 8 | INDEX RANGE SCAN | T2_IDX | | | | | 9 | PX BLOCK ITERATOR | | Q1,01 | PCWC | | |* 10 | TABLE ACCESS FULL | T1 | Q1,01 | PCWP | | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID") 8 - access("T2"."OBJECT_NAME" LIKE 'BLUB%') filter("T2"."OBJECT_NAME" LIKE 'BLUB%') 10 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"T1"."OBJECT_ID")) -- 12.1.0.2 plan shape -------------------------------------------------------------------------------------------- | Id | Operation | Name | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | Q1,01 | P->S | QC (RAND) | |* 3 | HASH JOIN | | Q1,01 | PCWP | | | 4 | JOIN FILTER CREATE | :BF0000 | Q1,01 | PCWP | | | 5 | PX RECEIVE | | Q1,01 | PCWP | | | 6 | PX SEND BROADCAST | :TQ10000 | Q1,00 | S->P | BROADCAST | | 7 | PX SELECTOR | | Q1,00 | SCWC | | | 8 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | Q1,00 | SCWC | | |* 9 | INDEX RANGE SCAN | T2_IDX | Q1,00 | SCWP | | | 10 | JOIN FILTER USE | :BF0000 | Q1,01 | PCWP | | | 11 | PX BLOCK ITERATOR | | Q1,01 | PCWC | | |* 12 | TABLE ACCESS FULL | T1 | Q1,01 | PCWP | | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID") 9 - access("T2"."OBJECT_NAME" LIKE 'BLUB%') filter("T2"."OBJECT_NAME" LIKE 'BLUB%') 12 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"T1"."OBJECT_ID"))The pre-12c plan shape here shows two significant things that I want to emphasize: First this plan shape only requires a single PX slave set since the Query Coordinator takes over the part that needs to be re-distributed, so although we have a plan shape that requires re-distribution there's only a single PX slave set involved. In case there is at least one operation that gets executed in parallel and requires re-distribution there always will be two PX slave sets. Second the plan shape demonstrates that parts of a Parallel Execution plan that get executed serially by the Query Coordinator require an additional BUFFER SORT operation. The HASH JOIN operation itself is blocking while it is consuming the left row source for building the hash table, so there is no true requirement to add another BUFFER SORT after the PX RECEIVE operation, but it looks like a pretty strict rule that any serial activity that involves the Query Coordinator adda a BUFFER SORT operation after re-distribution - I assume the reasoning for this is that the Query Coordinator isn't available for "coordinating" the PX slaves as along as it is actively involved in executing serial operations, hence the need to block any other parallel activity. This normally shouldn't be too relevant to performance since you should only execute operations serially that are tiny and not worth to run parallel, so buffering them shouldn't add much overhead, but it's just another reason why you see additional BUFFER SORT operations in parallel plans that are not there in serial-only plans. The 12c plan shape shows the new PX SELECTOR operator that executes now the serial part of the execution plan instead of the Query Coordinator. This also adds new decorators in the IN-OUT column called "SCWC" and "SCWP" respectivley, which you won't find in pre-12c plans - they are probably meant to read "Serial Combined With Child/Parent", similar to "PCWC/PCWP". The good thing about the new PX SELECTOR is that the need for an additional BUFFER SORT operator is now gone. However, one side-effect of the new operator for this particular plan shape here is that now a second PX slave set is allocated, although only one PX slave actually will get used at runtime. Note that for other plan shapes that need two PX slave sets anyway this doesn't matter. Another good thing about the new PX SELECTOR operator is that it avoids an odd bug that sometimes happens with Serial->Parallel redistributions when the Query Coordinator is involved. This bug causes some delay to the overall execution that usually isn't too relevant since it only adds approx 1-2 seconds delay (but it can occur several times per execution so these seconds can add up) and therefore is rarely noticed when a Parallel Execution might take several seconds / minutes typically. I might cover this bug in a separate blog post. Unrelated to the PX SELECTOR operator, the 12c plan shape also demonstrates that in 12c the way Bloom filters are shown in the plan has been improved. The 11.2.0.4 version includes the same Bloom filter as you can see from the "Predicate Information" section of the plan but doesn't make it that obvious from the plan shape that it is there (and sometimes in pre-12c it even doesn't show up in the "Predicate Information" section but is still used)