A new version 4.2 of the XPLAN_ASH utility is available for download.
As usual the latest version can be downloaded here.
There were no too significant changes in this release, mainly some new sections related to I/O figures were added.
One thing to note is that some of the sections in recent releases may require a linesize larger than 700, so the script's settings have been changed to 800. If you use corresponding settings for CMD.EXE under Windows for example you might have to adjust accordingly to prevent ugly line wrapping.
Here are the notes from the change log:
- New sections "Concurrent activity I/O Summary based on ASH" and "Concurrent activity I/O Summary per Instance based on ASH" to see the I/O activity summary for concurrent activity
- Many averages and medians now also have accompanying minimum and maximum values shown. This isn't as good as having histograms but gives a better idea of the range of values, and how potentially outliers influence the average and deserve further investigations
- Bug fixed: When using MONITOR as source for searching for the most recent SQL_ID executed by a given SID due to some filtering on date no SQL_ID was found. This is now fixed
- Bug fixed: In RAC GV$ASH_INFO should be used to determine available samples
- The "Parallel Execution Skew ASH" indicator is now weighted - so far any activity level per plan line and sample below the actual DOP counted as one, and the same if the activity level was above
The sum of the "ones" was then set relative to the total number of samples the plan line was active to determine the "skewness" indicator
Now the actual difference between the activity level and the actual DOP is calculated and compared to the number of total samples active times the actual DOP
This should give a better picture of the actual impact the skew has on the overall execution
- Most queries now use a NO_STATEMENT_QUEUING hint for environments where AUTO DOP is enabled and the XPLAN_ASH queries could get queued otherwise
- The physical I/O bytes on execution plan line level taken from "Real-Time SQL Monitoring" has now the more appropriate heading "ReadB" and "WriteB", I never liked the former misleading "Reads"/"Writes" heading
Sunday, December 21, 2014
Sunday, October 26, 2014
Heuristic TEMP Table Transformation
There are at least three different ways how the Oracle optimizer can come up with a so called TEMP table transformation, that is materializing an intermediate result set:
- As part of a star transformation the repeated access to dimensions can be materialized
- As part of evaluating GROUPING SETs intermediate result sets can be materialized
- Common Subquery/Table Expressions (CTE, WITH clause)
Probably the most common usage of the materialization is in conjunction with the WITH clause.
This is nothing new but since I came across this issue several times recently, here's a short demonstration and a reminder that this so called "TEMP Table Transformation" - at least in the context of the WITH clause - isn't really cost-based, in contrast to most other optimizer transformations nowadays - although the unnest transformation of subqueries also has a "no-brainer" variant where costing isn't considered.
The logic simply seems to be: If the CTE expression is referenced more than once AND the CTE expression contains at least some (filter or join) predicate then it will be materialized.
While in most cases this makes sense to avoid the otherwise repeated evaluation of the CTE expression, there are cases where additional predicates that could be pushed inside the CTE would lead to different, significantly more efficient access paths than materializing the full CTE expression without applying the filters and filtering on the TEMP table afterwards.
Here are just two very simple examples that demonstrate the point, both based on this sample table setup:
create table t1 as select rownum as id , rpad('x', 100) as filler from dual connect by level <=1e5; exec dbms_stats.gather_table_stats(null, 't1') create index t1_idx on t1 (id);The index on T1.ID opens up potentially a very precise access to rows. Here is example number one:
with a as ( select /* inline */ id , filler from t1 where filler != 'x' ) select t1.* , a1.filler , a2.filler from t1 , a a1 , a a2 where a1.id = t1.id and a2.id = t1.id and t1.id = 1 and a1.id = 1 and a2.id = 1 ; -- 11.2.0.3 Plan without INLINE hint ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 236 | 1207 (1)| 00:00:01 | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6619_229329 | | | | | |* 3 | TABLE ACCESS FULL | T1 | 99999 | 10M| 420 (1)| 00:00:01 | |* 4 | HASH JOIN | | 1 | 236 | 787 (1)| 00:00:01 | |* 5 | HASH JOIN | | 1 | 171 | 394 (1)| 00:00:01 | | 6 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 106 | 2 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN | T1_IDX | 1 | | 1 (0)| 00:00:01 | |* 8 | VIEW | | 99999 | 6347K| 392 (1)| 00:00:01 | | 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6619_229329 | 99999 | 10M| 392 (1)| 00:00:01 | |* 10 | VIEW | | 99999 | 6347K| 392 (1)| 00:00:01 | | 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6619_229329 | 99999 | 10M| 392 (1)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ -- 11.2.0.4 Plan without INLINE hint -------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9999M| 2197G| | 28468 (92)| 00:00:02 | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D661A_229329 | | | | | | |* 3 | TABLE ACCESS FULL | T1 | 99999 | 10M| | 420 (1)| 00:00:01 | |* 4 | HASH JOIN | | 9999M| 2197G| 7520K| 28048 (93)| 00:00:02 | |* 5 | VIEW | | 99999 | 6347K| | 392 (1)| 00:00:01 | | 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_229329 | 99999 | 10M| | 392 (1)| 00:00:01 | |* 7 | HASH JOIN | | 99999 | 16M| | 394 (1)| 00:00:01 | | 8 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 106 | | 2 (0)| 00:00:01 | |* 9 | INDEX RANGE SCAN | T1_IDX | 1 | | | 1 (0)| 00:00:01 | |* 10 | VIEW | | 99999 | 6347K| | 392 (1)| 00:00:01 | | 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_229329 | 99999 | 10M| | 392 (1)| 00:00:01 | -------------------------------------------------------------------------------------------------------------------- -- 11.2.0.3/11.2.0.4 Plan with INLINE hint ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 318 | 6 (0)| 00:00:01 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 1 | 318 | 6 (0)| 00:00:01 | | 3 | NESTED LOOPS | | 1 | 212 | 4 (0)| 00:00:01 | |* 4 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 106 | 2 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | T1_IDX | 1 | | 1 (0)| 00:00:01 | | 6 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 106 | 2 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN | T1_IDX | 1 | | 1 (0)| 00:00:01 | |* 8 | INDEX RANGE SCAN | T1_IDX | 1 | | 1 (0)| 00:00:01 | |* 9 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 106 | 2 (0)| 00:00:01 | -----------------------------------------------------------------------------------------The filter in the CTE expression is just there to fulfill the rules I've stated above, without it the TEMP table transformation wouldn't be considered at all. It could also be a (non-filtering) join condition, for example. Notice the big difference in cost estimates between the plans with and without materialization. Clearly a cost-based evaluation should have rejected the TEMP table transformation, simply because it is a bad idea to materialize 100K rows and afterwards access this TEMP table twice to filter out exactly a single row, instead of accessing the original, untransformed row source twice via precise index access. This is by the way an example of another anomaly that was only recently introduced (apparently in the 11.2.0.4 patch set / 12.1 release): Notice the bad cardinality estimate in the 11.2.0.4 plan with the TEMP table transformation - the filter on the TEMP table isn't evaluated properly (was already there in previous releases) and in addition the join cardinality is way off - 10G rows instead of a single row is not really a good estimate - and as a side effect the HASH JOIN uses a bad choice for the build row sources. Another possible, perhaps less common variant is this example:
with a as ( select /* inline */ id , filler from t1 where filler != 'x' ) select id , (select filler from a where id = x.id) as scal_val1 , (select filler from a where id = x.id) as scal_val2 from t1 x ; -- 12.1.0.2 Plan without INLINE hint -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 488K| 77M (1)| 00:50:26 | |* 1 | VIEW | | 99999 | 6347K| 392 (1)| 00:00:01 | | 2 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660F_229329 | 99999 | 10M| 392 (1)| 00:00:01 | |* 3 | VIEW | | 99999 | 6347K| 392 (1)| 00:00:01 | | 4 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660F_229329 | 99999 | 10M| 392 (1)| 00:00:01 | | 5 | TEMP TABLE TRANSFORMATION | | | | | | | 6 | LOAD AS SELECT | SYS_TEMP_0FD9D660F_229329 | | | | | |* 7 | TABLE ACCESS FULL | T1 | 99999 | 10M| 420 (1)| 00:00:01 | | 8 | TABLE ACCESS FULL | T1 | 100K| 488K| 420 (1)| 00:00:01 | -------------------------------------------------------------------------------------------------------- -- 12.1.0.2 Plan with INLINE hint ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 488K| 398K (1)| 00:00:16 | |* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 106 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T1_IDX | 1 | | 1 (0)| 00:00:01 | |* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 106 | 2 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | T1_IDX | 1 | | 1 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL | T1 | 100K| 488K| 420 (1)| 00:00:01 | ----------------------------------------------------------------------------------------------This time I've shown plans from 12.1.0.2 - the latest available release as I write this - to demonstrate that this hasn't changed yet. What has changed in 12c is that the scalar subqueries are now actually represented in the final cost - in pre-12c these costs wouldn't be part of the total cost. So although due to that the cost difference between the two plans in 12c is much more significant than in pre-12c the optimizer still opts for materializing the CTE expression and running full table scans in the scalar subqueries on that temp table instead of taking advantage of the precise access path available - again very likely a pretty bad idea at runtime. So whenever you make use of the WITH clause make sure you've considered the access paths that might be available when not materializing the result set.
Footnote
As of Oracle 12.1 the MATERIALIZE and INLINE hints are still not officially documented.Friday, August 1, 2014
Parallel Execution Skew - Summary
I've published the final part of my video tutorial and the final part of my mini series "Parallel Execution Skew" at AllThingsOracle.com concluding what I planned to publish on the topic of Parallel Execution Skew.
Anyone regularly using Parallel Execution and/or relying on Parallel Execution for important, time critical processing should know this stuff. In my experience however almost no-one does, and therefore misses possibly huge opportunities for optimizing Parallel Execution performance.
Since all this was published over a longer period of time this post therefore is a summary with pointers to the material.
If you want to get an idea what the material is about, the following video summarizes the content:
Parallel Execution Skew in less than four minutes
Video Tutorial "Analysing Parallel Execution Skew":
Part 1: Introduction
Part 2: DFOs and DFO Trees
Part 3: Without Diagnostics / Tuning Pack license
Part 4: Using Diagnostics / Tuning Pack license
"Parallel Execution Skew" series at AllThingsOracle.com:
Part 1: Introduction
Part 2: Demonstrating Skew
Part 3: 12c Hybrid Hash Distribution With Skew Detection
Part 4: Addressing Skew Using Manual Rewrites
Part 5: Skew Caused By Outer Joins
Anyone regularly using Parallel Execution and/or relying on Parallel Execution for important, time critical processing should know this stuff. In my experience however almost no-one does, and therefore misses possibly huge opportunities for optimizing Parallel Execution performance.
Since all this was published over a longer period of time this post therefore is a summary with pointers to the material.
If you want to get an idea what the material is about, the following video summarizes the content:
Parallel Execution Skew in less than four minutes
Video Tutorial "Analysing Parallel Execution Skew":
Part 1: Introduction
Part 2: DFOs and DFO Trees
Part 3: Without Diagnostics / Tuning Pack license
Part 4: Using Diagnostics / Tuning Pack license
"Parallel Execution Skew" series at AllThingsOracle.com:
Part 1: Introduction
Part 2: Demonstrating Skew
Part 3: 12c Hybrid Hash Distribution With Skew Detection
Part 4: Addressing Skew Using Manual Rewrites
Part 5: Skew Caused By Outer Joins
Labels:
AllThingsOracle,
Parallel Execution,
series,
skew,
Video Tutorial,
XPLAN_ASH
Sunday, June 29, 2014
New Version Of XPLAN_ASH Utility
A new version 4.1 of the XPLAN_ASH utility is available for download.
As usual the latest version can be downloaded here.
This version in particular supports now the new 12c "Adaptive" plan feature - previous versions don't cope very well with those if you don't add the "ADAPTIVE" formatting option manually.
Here are the notes from the change log:
- GV$SQL_MONITOR and GV$SQL_PLAN_MONITOR can now be customized in the
settings as table names in case you want to use your own custom monitoring repository that copies data from GV$SQL_MONITOR and GV$SQL_PLAN_MONITOR in order to keep/persist monitoring data. The tables need to have at least those columns that are used by XPLAN_ASH from the original views
- The "Activity Timeline based on ASH" for RAC Cross Instance Parallel Execution shows an additional line break for the GLOBAL summary
- Fixed various bugs related to some figures when analyzing Cross Instance RAC Parallel Execution
- The new "GLOBAL" aggregation level for Cross Instance RAC Parallel Execution (see version 4.0 change log below) is now also shown in the "Information on Parallel Degree based on ASH" section
- The "Parallel Distribution ASH" column on execution plan line level now can show process information taken from Real-Time SQL Monitoring for those processes that are not found in ASH samples.
This effectively means that with monitoring information available for every plan line every involved process will now be shown along with its ASH sample count and rows produced
So some processes will show up now with a sample count of 0.
The idea behind this is to provide more information about row distribution even for those lines/processes that are not covered by the ASH samples.
Previously the rowcount produced was only shown for those processes covered in ASH samples
The new behaviour is default - if you find the output messy you can return to previous behaviour (show only rowcounts for processes found in ASH samples) by setting the new configuration switch "show_monitor_rowcount" to any other value than the default of "YES"
- The "Real-Time SQL Monitoring" information on execution plan line level now includes the read and write request information ("ReadReq", "WriteReq")
- The I/O figures based on ASH now include the new "DELTA_READ_MEM_BYTES" information that was added in 12c. This applies to the following sections:
- SQL Statement I/O Summary based on ASH
- Parallel Worker activity overview based on ASH
- Activity Timeline based on ASH
The "Read Mem Bytes" seems to correspond to the "logical read bytes from cache" statistics, so any direct path reads are not covered by this value
- Added some more verbose description in the "Note" sections how to handle long lines. XPLAN_ASH now does a SET TRIMSPOOL ON if you want to spool the output to a file
- Whenever the output referred to DFOs this was changed to "DFO TREE", which is the correct term
- The "Parallel Worker activity overview based on ASH" section now shows a blank line between the sections which should make this section more readable
- Adaptive plans are now supported by XPLAN_ASH
Note they don't work well with previous versions, the formatting of the inactive lines breaks and the overall information can be misleading if you don't add manually the "ADAPTIVE" formatting option
If XPLAN_ASH detects an adaptive plan, it will always force the ADAPTIVE formatting option.
This also means that Adaptive plans for the time being won't work with SASH as SASH doesn't collect the OTHER_XML column from GV$SQL_PLAN
You could manually add that column to SASH_SQLPLANS and add the column to the "sash_pkg.get_sqlplans" procedure - this is a CLOB column, but INSERT / SELECT should work I think
The view SASH_PLAN_TABLE needs also to be modified to select the OTHER_XML column instead of a dummy NULL
Although this output is less readable than the "faked" output that shows only the plan operations that are actually in use, it is the only simple way how ASH/MONITOR data can be related to execution plan lines, as these hold the information with the actual plan line, not the one that is made up by DBMS_XPLAN.DISPLAY* based on the DISPLAY_MAP information in the OTHER_XML column
Hence I decided for the time being to use the same approach as 12c Real-Time SQL Monitoring and always show the full/adaptive shape of the plan
Another challenge for XPLAN_ASH with adaptive plans is the possibly changing PLAN_HASH_VALUE during execution.
XPLAN_ASH extracts the PLAN_HASH_VALUE from ASH/MONITOR when trying to get the plan from DBA_HIST_SQL_PLAN.
Hence XPLAN_ASH now needs to take care to extract the most recent PLAN_HASH_VALUE, previously it didn't matter as it wasn't supposed to change during execution. This seems to work based on my tests, but it's something to keep in mind
- The new "gather stats on load" 12c feature implies for INSERT...SELECT statements that the cursor will immediately be invalidated/removed from the Library Cache after (successful) execution. So now such
INSERT...SELECT behave like CTAS which also gets removed immediately. This is a pity as you won't be able to pick up the plan from the Library Cache after the execution completes using XPLAN_ASH (or any other tool using DBMS_XPLAN.DISPLAY*).
Although V$SQL_PLAN_MONITOR might keep plan for some time after the execution, it can't be used as input to DBMS_XPLAN.DISPLAY*, hence this isn't a viable workaround. In principle however this isn't a good thing as the SQL and plan information might be missing from AWR / STATSPACK reports due to the
immediate invalidation/removal.
At the time being the only viable workaround known to me for this is to prevent the "gather stats on load" feature either via parameter "_optimizer_gather_stats_on_load" or hint "no_gather_optimizer_statistics", or via using pre-12c optimizer feature settings which implicitly disables the feature which is of course not
really a good workaround as the feature itself might be rather desirable
As usual the latest version can be downloaded here.
This version in particular supports now the new 12c "Adaptive" plan feature - previous versions don't cope very well with those if you don't add the "ADAPTIVE" formatting option manually.
Here are the notes from the change log:
- GV$SQL_MONITOR and GV$SQL_PLAN_MONITOR can now be customized in the
settings as table names in case you want to use your own custom monitoring repository that copies data from GV$SQL_MONITOR and GV$SQL_PLAN_MONITOR in order to keep/persist monitoring data. The tables need to have at least those columns that are used by XPLAN_ASH from the original views
- The "Activity Timeline based on ASH" for RAC Cross Instance Parallel Execution shows an additional line break for the GLOBAL summary
- Fixed various bugs related to some figures when analyzing Cross Instance RAC Parallel Execution
- The new "GLOBAL" aggregation level for Cross Instance RAC Parallel Execution (see version 4.0 change log below) is now also shown in the "Information on Parallel Degree based on ASH" section
- The "Parallel Distribution ASH" column on execution plan line level now can show process information taken from Real-Time SQL Monitoring for those processes that are not found in ASH samples.
This effectively means that with monitoring information available for every plan line every involved process will now be shown along with its ASH sample count and rows produced
So some processes will show up now with a sample count of 0.
The idea behind this is to provide more information about row distribution even for those lines/processes that are not covered by the ASH samples.
Previously the rowcount produced was only shown for those processes covered in ASH samples
The new behaviour is default - if you find the output messy you can return to previous behaviour (show only rowcounts for processes found in ASH samples) by setting the new configuration switch "show_monitor_rowcount" to any other value than the default of "YES"
- The "Real-Time SQL Monitoring" information on execution plan line level now includes the read and write request information ("ReadReq", "WriteReq")
- The I/O figures based on ASH now include the new "DELTA_READ_MEM_BYTES" information that was added in 12c. This applies to the following sections:
- SQL Statement I/O Summary based on ASH
- Parallel Worker activity overview based on ASH
- Activity Timeline based on ASH
The "Read Mem Bytes" seems to correspond to the "logical read bytes from cache" statistics, so any direct path reads are not covered by this value
- Added some more verbose description in the "Note" sections how to handle long lines. XPLAN_ASH now does a SET TRIMSPOOL ON if you want to spool the output to a file
- Whenever the output referred to DFOs this was changed to "DFO TREE", which is the correct term
- The "Parallel Worker activity overview based on ASH" section now shows a blank line between the sections which should make this section more readable
- Adaptive plans are now supported by XPLAN_ASH
Note they don't work well with previous versions, the formatting of the inactive lines breaks and the overall information can be misleading if you don't add manually the "ADAPTIVE" formatting option
If XPLAN_ASH detects an adaptive plan, it will always force the ADAPTIVE formatting option.
This also means that Adaptive plans for the time being won't work with SASH as SASH doesn't collect the OTHER_XML column from GV$SQL_PLAN
You could manually add that column to SASH_SQLPLANS and add the column to the "sash_pkg.get_sqlplans" procedure - this is a CLOB column, but INSERT / SELECT should work I think
The view SASH_PLAN_TABLE needs also to be modified to select the OTHER_XML column instead of a dummy NULL
Although this output is less readable than the "faked" output that shows only the plan operations that are actually in use, it is the only simple way how ASH/MONITOR data can be related to execution plan lines, as these hold the information with the actual plan line, not the one that is made up by DBMS_XPLAN.DISPLAY* based on the DISPLAY_MAP information in the OTHER_XML column
Hence I decided for the time being to use the same approach as 12c Real-Time SQL Monitoring and always show the full/adaptive shape of the plan
Another challenge for XPLAN_ASH with adaptive plans is the possibly changing PLAN_HASH_VALUE during execution.
XPLAN_ASH extracts the PLAN_HASH_VALUE from ASH/MONITOR when trying to get the plan from DBA_HIST_SQL_PLAN.
Hence XPLAN_ASH now needs to take care to extract the most recent PLAN_HASH_VALUE, previously it didn't matter as it wasn't supposed to change during execution. This seems to work based on my tests, but it's something to keep in mind
- The new "gather stats on load" 12c feature implies for INSERT...SELECT statements that the cursor will immediately be invalidated/removed from the Library Cache after (successful) execution. So now such
INSERT...SELECT behave like CTAS which also gets removed immediately. This is a pity as you won't be able to pick up the plan from the Library Cache after the execution completes using XPLAN_ASH (or any other tool using DBMS_XPLAN.DISPLAY*).
Although V$SQL_PLAN_MONITOR might keep plan for some time after the execution, it can't be used as input to DBMS_XPLAN.DISPLAY*, hence this isn't a viable workaround. In principle however this isn't a good thing as the SQL and plan information might be missing from AWR / STATSPACK reports due to the
immediate invalidation/removal.
At the time being the only viable workaround known to me for this is to prevent the "gather stats on load" feature either via parameter "_optimizer_gather_stats_on_load" or hint "no_gather_optimizer_statistics", or via using pre-12c optimizer feature settings which implicitly disables the feature which is of course not
really a good workaround as the feature itself might be rather desirable
Monday, June 23, 2014
Parallel Execution Skew - Addressing Skew Using Manual Rewrites
This is just a short note that the next part of the mini series about Parallel Execution skew has been published at AllThingsOracle.com.
After having shown in the previous instalment of the series that Oracle 12c added a new feature that can deal with Parallel Execution skew (at present in a limited number of scenarios) I now demonstrate in that part how the problem can be addressed using manual query rewrites, in particular the probably not so commonly known technique of redistributing popular values using an additional re-mapping table.
After having shown in the previous instalment of the series that Oracle 12c added a new feature that can deal with Parallel Execution skew (at present in a limited number of scenarios) I now demonstrate in that part how the problem can be addressed using manual query rewrites, in particular the probably not so commonly known technique of redistributing popular values using an additional re-mapping table.
Labels:
AllThingsOracle,
article,
Parallel Execution,
skew
Sunday, May 18, 2014
12c Hybrid Hash Distribution with Skew Detection / Handling - Failing
This is just an addendum to the previous post demonstrating one example (out of many possible) where the join skew handling feature fails. The test case setup is the same as in the previous post.
As mentioned in the AllThingsOracle.com article and in the introduction of the previous post, the feature at present only applies to a rather limited number of scenarios. To wrap things up and to give an idea what can happen with that new feature, here's a three table join that actually makes use of the feature for one join, only to suffer from the skew problem in the next join that uses the same join expression, but doesn't qualify (yet) for the skew handling feature:
-- Here the HYBRID SKEW distribution works for the B->C join -- But the (B->C)->A join is affected by the same skew -- So the HASH re-distribution of the (B->C) join result on B.ID is skewed -- And hence the HASH JOIN/SORT AGGREGATE (operation 4+5) are affected by the skew -- An interesting question is: Why is there a re-distribution (operation 11+12)? -- The data is already distributed on B.ID / C.FK_ID -- In pre-12c no redistribution happens -- So it looks like a side-effect of the hybrid distribution -- Which makes sense as it is not really HASH distributed, but hybrid select count(t_2_filler) from ( select /*+ monitor leading(b c a) use_hash(c) no_swap_join_inputs(c) pq_distribute(c hash hash) use_hash(a) swap_join_inputs(a) pq_distribute(a hash hash) */ a.id as t_1_id , a.filler as t_1_filler , c.id as t_2_id , c.filler as t_2_filler from t_1 a , t_1 b , t_2 c where c.fk_id = b.id and regexp_replace(c.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and a.id = b.id and regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') );Here's the execution plan from 12.1:
----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 318 | | | | | 1 | SORT AGGREGATE | | 1 | 318 | | | | | 2 | PX COORDINATOR | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10004 | 1 | 318 | Q1,04 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | 318 | Q1,04 | PCWP | | |* 5 | HASH JOIN | | 5016 | 1557K| Q1,04 | PCWP | | | 6 | PX RECEIVE | | 2000K| 202M| Q1,04 | PCWP | | | 7 | PX SEND HYBRID HASH | :TQ10002 | 2000K| 202M| Q1,02 | P->P | HYBRID HASH| | 8 | STATISTICS COLLECTOR | | | | Q1,02 | PCWC | | | 9 | PX BLOCK ITERATOR | | 2000K| 202M| Q1,02 | PCWC | | | 10 | TABLE ACCESS FULL | T_1 | 2000K| 202M| Q1,02 | PCWP | | | 11 | PX RECEIVE | | 100K| 20M| Q1,04 | PCWP | | | 12 | PX SEND HYBRID HASH | :TQ10003 | 100K| 20M| Q1,03 | P->P | HYBRID HASH| |* 13 | HASH JOIN BUFFERED | | 100K| 20M| Q1,03 | PCWP | | | 14 | PX RECEIVE | | 2000K| 202M| Q1,03 | PCWP | | | 15 | PX SEND HYBRID HASH | :TQ10000 | 2000K| 202M| Q1,00 | P->P | HYBRID HASH| | 16 | STATISTICS COLLECTOR | | | | Q1,00 | PCWC | | | 17 | PX BLOCK ITERATOR | | 2000K| 202M| Q1,00 | PCWC | | | 18 | TABLE ACCESS FULL | T_1 | 2000K| 202M| Q1,00 | PCWP | | | 19 | PX RECEIVE | | 2000K| 202M| Q1,03 | PCWP | | | 20 | PX SEND HYBRID HASH (SKEW)| :TQ10001 | 2000K| 202M| Q1,01 | P->P | HYBRID HASH| | 21 | PX BLOCK ITERATOR | | 2000K| 202M| Q1,01 | PCWC | | | 22 | TABLE ACCESS FULL | T_2 | 2000K| 202M| Q1,01 | PCWP | | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("A"."ID"="B"."ID") filter( REGEXP_REPLACE ("A"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c')>= REGEXP_REPLACE ("B"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c')) 13 - access("C"."FK_ID"="B"."ID") filter( REGEXP_REPLACE ("C"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c')>= REGEXP_REPLACE ("B"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c'))And here's the formatted output from V$PQ_TQSTAT (two popular values in T_2.FK_ID, DOP = 4):
TQ_ID SERVER_TYP INSTANCE PROCESS NUM_ROWS % GRAPH MB bytes/row ---------- ---------- ---------- -------- ---------- ---------- ---------- ---------- ---------- 0 Producer 1 P000 508088 25 ########## 53 109 P001 497226 25 ########## 52 109 P002 489964 24 ########## 51 109 P003 504728 25 ########## 52 109 ********** ********** ---------- Total 2000006 Consumer 1 P004 499616 25 ########## 52 109 P005 500736 25 ########## 52 109 P006 499523 25 ########## 52 109 P007 500131 25 ########## 52 109 ********** ********** ---------- Total 2000006 1 Producer 1 P000 486339 24 ######### 50 107 P001 482280 24 ######### 50 108 P002 518636 26 ########## 53 107 P003 512745 26 ########## 52 107 ********** ********** ---------- Total 2000000 Consumer 1 P004 500246 25 ########## 51 107 P005 500104 25 ########## 51 107 P006 499437 25 ########## 51 107 P007 500213 25 ########## 51 107 ********** ********** ---------- Total 2000000 2 Producer 1 P004 503398 25 ########## 52 108 P005 498008 25 ########## 52 109 P006 507306 25 ########## 53 109 P007 491288 25 ########## 51 109 ********** ********** ---------- Total 2000000 Consumer 1 P000 500363 25 ########## 52 109 P001 500256 25 ########## 52 109 P002 499609 25 ########## 52 109 P003 499772 25 ########## 52 109 ********** ********** ---------- Total 2000000 3 Producer 1 P004 500246 25 ########## 100 210 P005 500104 25 ########## 100 210 P006 499437 25 ########## 100 210 P007 500213 25 ########## 100 210 ********** ********** ---------- Total 2000000 Consumer 1 P000 199870 10 ## 40 211 P001 200300 10 ### 40 211 P002 799835 40 ########## 159 209 P003 799995 40 ########## 159 209 ********** ********** ---------- Total 2000000 4 Producer 1 P000 1 25 ########## 0 36 P001 1 25 ########## 0 36 P002 1 25 ########## 0 36 P003 1 25 ########## 0 36 ********** ********** ---------- Total 4 Consumer 1 QC 4 100 ########## 0 36 ********** ********** ---------- Total 4There are a couple of interesting things to notice: 1. The execution plan shows another redistribution of the (B->C) join result for joining to (B->C)->A, although both joins use the same join expression (B.ID). So there is an additional table queue / redistribution (operations 11 + 12) and in consequence the HASH JOIN (operation 13) turns into a HASH JOIN BUFFERED. You won't find such a re-distribution (and HASH JOIN BUFFERED) in a pre-12c plan, simply because the optimizer recognizes that the data is already distributed in a suitable way. But in case of the HYBRID HASH distribution the data isn't necessarily exactly distributed by HASH (but by a mixture of BROADCAST/HASH/ROUND-ROBIN) and so the optimizer needs to play safe and introduce another redistribution 2. This additional redistribution isn't skew aware - so while we can see from the V$PQ_TQSTAT query result that for table queues 0 and 1 the skew detection / handling worked and ensured an even work distribution (the output above is from the variant running at a DOP of 4 and having two popular values) for table queues 2 and 3 a normal HASH distribution was used, leading to skew as can be seen in the "Consumer" part of TQ_ID = 3 So for the time being don't count on the new feature to solve parallel join skew problems in general. Sometimes it might work, but there are at present simply too many scenarios where it won't apply.
Labels:
12.1.0.1,
12cR1,
Parallel Execution,
skew
Sunday, May 4, 2014
12c New Optimizer Features
Besides the officially available information about new optimizer features in 12c it is always a good idea to have a look at the internal optimizer parameters that show what features are enabled when running with OPTIMIZER_FEATURES_ENABLE = 12.1.0.1. Here is the list of internal optimizer parameters and fix controls that are different between 11.2.0.4 and 12.1.0.1:
Optimizer parameters:
Fix controls:
So there are lots of interesting things mentioned, in particular the Fix Control list contains some very interesting changes. I've highlighted those that at first glance looked interesting to me - and some of them, at least according to the description, seem to introduce significant changes to the CBO calculations and transformations. Time to repeat some existing test cases...
Optimizer parameters:
_optimizer_partial_join_eval partial join evaluation parameter _optimizer_unnest_scalar_sq enables unnesting of of scalar subquery _optimizer_ansi_join_lateral_enhance optimization of left/full ansi-joins and lateral views _optimizer_multi_table_outerjoin allows multiple tables on the left of outerjoin _optimizer_null_accepting_semijoin enables null-accepting semijoin _optimizer_ansi_rearchitecture re-architecture of ANSI left, right, and full outer joins _optimizer_cube_join_enabled enable cube join _optimizer_hybrid_fpwj_enabled enable hybrid full partition-wise join when TRUE _px_object_sampling_enabled use base object sampling when possible for range distribution _px_concurrent enables pq with concurrent execution of serial inputs _px_replication_enabled enables or disables replication of small table scans _px_filter_parallelized enables or disables correlated filter parallelization _px_filter_skew_handling enable correlated filter parallelization to handle skew _px_groupby_pushdown perform group-by pushdown for parallel query _px_parallelize_expression enables or disables expression evaluation parallelization _optimizer_gather_stats_on_load enable/disable online statistics gathering _optimizer_batch_table_access_by_rowid enable table access by ROWID IO batching _px_wif_dfo_declumping NDV-aware DFO clumping of multiple window sorts _px_wif_extend_distribution_keys extend TQ data redistribution keys for window functions _px_join_skew_handling enables skew handling for parallel joins _px_adaptive_dist_method determines the behavior of adaptive distribution methods _px_partial_rollup_pushdown perform partial rollup pushdown for parallel execution _optimizer_dsdir_usage_control controls optimizer usage of dynamic sampling directives _px_cpu_autodop_enabled enables or disables auto dop cpu computation _px_single_server_enabled allow single-slave dfo in parallel query _optimizer_use_gtt_session_stats use GTT session private statistics _optimizer_adaptive_plans enable adaptive plans _optimizer_strans_adaptive_pruning allow adaptive pruning of star transformation bitmap trees _optimizer_proc_rate_level control the level of processing rates _adaptive_window_consolidator_enabled enable/disable adaptive window consolidator PX plan _px_cdb_view_enabled parallel cdb view evaluation enabled _partition_cdb_view_enabled partitioned cdb view evaluation enabled _common_data_view_enabled common objects returned through dictionary views _optimizer_cluster_by_rowid enable/disable the cluster by rowid feature _pred_push_cdb_view_enabled predicate pushdown enabled for CDB views _rowsets_cdb_view_enabled rowsets enabled for CDB views _array_cdb_view_enabled array mode enabled for CDB views
Fix controls:
9898249 initialize col stats for olap earlier in compilation 10004943 enable removal of group by in subquery for create table 9554026 store index filter selectivity/cost 9593547 estimate selectivity for unique scan predicates 9833381 rank predicates before costing 10106423 use base NDV for predicate selectivity in new join order 10175079 increment kafcoref to simulate obsolete fropqop list 10236566 do replacement for expressions involving correlated columns 9721228 allow pushing of complex predicates to Exadata cell 9929609 use qksvc to handle descending indexes 10182672 enhance uniquenes detection of a query block 9832338 disallow outer join oper (+) in CONNECT BY and START WITH clause 11668189 parallelize top-level union all if PDDL or PDML 11940126 fixed groupby partition count method 12390139 enhance qsme to handle more cases 11744016 enhance algorithm to detrimine optimizer duplicate insignificanc 10216738 Toggels subquery coalescing for ANY and ALL subqueries 12563419 add cost of scalar subquery into the cost of outer query 12535474 parallelize nested table access through table function 12561635 cap parallelism if order-by cannot go parallel 12569245 enable bloom filter for partition wise joins 12569300 improve bloom filter costing and heuristics 12569316 show broadcast dist bloom filter rowsource in explain plan 12569321 push bloom filter through other bloom filters 12810427 cardinality feedback for join cardinalities 12914055 use ADS for large parallel tables based on the size 12978495 limit the computed DOP with access path constraints 13110511 allow group-by and distinct placement with in-list predicates 13345888 parallel table lookup access by rowid 13396096 allow expression in connecting cond for semi and regular anti jo 12999577 normalize subquery predicate 12954320 cardinality feedback for bind-aware cursors 13036910 use non-sampling-based freq. histograms for join card. estimatio 12648629 allow common sub-expression elemination after typecheck 13704977 fixes names/hints for multiple-inlined WITH subqueries 11843466 do not force serialize px for serial pl/sql 13909909 Cardinality feedback does not require ACS to be enabled 12856200 Allow partial partition-wise join for range sub-partitioning 9852856 Enable CBQT for MV refresh 14033181 correct ndv for non-popular values in join cardinality comp. 13836796 enable CBQT on queries with materialized WITH subqueries 13699643 Use cached evaluation context in kkoecp 13735304 relax restrictions on window function replaces subquery 14464068 filter pull up from UNION ALL view 13448445 enable serial check for IO dop 9114915 Allow predicate compression prior to type check 13109345 Enables cardinality feedback for parallel queries 14605040 Disable cardinality feedback for temp table 14633570 allow non-column correlations in inline-view generation checks 13573073 Resolve conflicting CFB hints
So there are lots of interesting things mentioned, in particular the Fix Control list contains some very interesting changes. I've highlighted those that at first glance looked interesting to me - and some of them, at least according to the description, seem to introduce significant changes to the CBO calculations and transformations. Time to repeat some existing test cases...
Labels:
12.1.0.1,
12cR1,
Optimizer,
Parameters
12c Hybrid Hash Distribution with Skew Detection / Handling
Oracle 12c introduces several new features in the area of Parallel Execution. Over the next couple of weeks I attempt to publish more about them - Jonathan Lewis for example already published a note about the new "PQ Replication" feature that applies to the BROADCAST distribution of small tables.
One important new feature is the automatic skew handling for parallel joins. I've already given an overview of the feature in my mini-series "Parallel Execution Skew" at "AllThingsOracle.com", so if all you want is a high-level overview I recommend reading the article there.
The purpose of this note here is to provide a few more internals and details about that feature.
First, just a short summary of the prerequisites of the feature to work:
1. An inner join - since only inner joins seem to support the HYBRID HASH distribution
2. A histogram on the join expression - although the skew handling can be forced without a histogram by using the PQ_SKEW hint - see below
3. A single join expression, at present joins on multiple predicates don't seem to be supported
4. A parallel HASH JOIN: A parallel MERGE JOIN doesn't seem to trigger the feature - although I don't see why it shouldn't work in principle with a MERGE JOIN
5. The row source with the skewed join expression needs to be the unswapped probe row source of the hash join
6. The row source with the skewed join expression needs to be a simple table - a row source that is a view or a result of another join suppresses the feature
7. If the skew handling isn't forced by using the PQ_SKEW hint but triggered by a histogram on the join expression, values need to "qualify" for skew according to the value distribution in the histogram (see below for more details)
If prerequisites 3-6 are not met but at least the HYBRID HASH distribution gets used, the optimizer trace contains a note like the following:
Skew handling disabled since conditions not satisfied: join:1 dist:2 smap:0 swapped:-1 predicate:1 equi-join:1 view:0Some of the prerequisites mentioned in that note seem to be superfluous to me, like the distribution method (dist: 2, but may be this is about "distance"?), equi-join (otherwise a hash join wouldn't be possible), but in particular the "join" (join method), "predicate" (number of join predicates), "view" and "swapped" condition seem to be relevant - I don't know what "smap" is supposed to mean, it could be related to the so called "local" distribution variation (LOCAL / PQ_MAP hint). As outlined in the other article the feature is triggered by a histogram on the join expression (and the new internal parameter "_px_join_skew_handling" that defaults to "TRUE" in 12c). The optimizer checks the histogram for popular values - and there are a few other new parameters that seem to control how "popular" a value needs to be in order to qualify as skewed. By default a value has to either occupy at least 10 buckets of the histogram or represent more than 30 percent of the total population, controlled via the parameters "_px_join_skew_ratio" (defaults to 10) and "_px_join_skew_minfreq" (defaults to 30), to be treated as skewed. You can find the corresponding trace output in the 10053 trace file:
skewRatio:10, skewMinFreq:30, minNDV:16, skewThreshold:0.625000 ind:0, csel:0.531250, skew count:1 ind:1, csel:0.031250, skew count:1 ind:2, csel:0.031250, skew count:1 ind:3, csel:0.031250, skew count:1 ind:4, csel:0.031250, skew count:1 ind:5, csel:0.031250, skew count:1 ind:6, csel:0.031250, skew count:1 ind:7, csel:0.031250, skew count:1 ind:8, csel:0.031250, skew count:1 ind:9, csel:0.031250, skew count:1 ind:10, csel:0.031250, skew count:1 ind:11, csel:0.031250, skew count:1 ind:12, csel:0.031250, skew count:1 ind:13, csel:0.031250, skew count:1 ind:14, csel:0.031250, skew count:1 ind:15, csel:0.031250, skew count:1 Skewed value count:1 scaling:0 degree:8Note that the "minNDV" value above refers to the number of histogram buckets, not to the actual number of distinct values in the column / expression - so the number of histogram buckets is a crucial input to that calculation - the "skewThreshold" is simply calculated as "1 / minNDV * skewRatio". These "skew" thresholds can cause some interesting scenarios: For example, as you can see from above trace snippet, for columns with a low number of distinct values (16 in my case here), a value will only be treated as skewed if it exceeds the 30 percent boundary, so having for example two values that represent 25 percent each will not activate the skew aware distribution code in above scenario. For typical columns that happen to have 254 or more distinct values you can assume that a value has to represent at least approx. four percent (1/254 * 10) of the population to qualify as skewed - and by increasing the number of histogram buckets to 255 or higher (only possible from 12c on) you can get values qualified by just crossing down to 1 / 2048 (max. number of histogram buckets in 12c) * 10, that's just 0.5 percent - not necessarily something you would expect to cause a lot of trouble with skew. If at least one value is found in the histogram that qualifies as skewed, the optimizer next runs a recursive query as part of the optimization phase to obtain the actual values - this is very likely required as the values in the histogram don't necessarily represent the actual value, there's some rounding / truncation going on, at least used to go on in the past. I haven't checked yet whether the new 12c histogram code stores the full value in the histogram - checking the corresponding dictionary views there are certainly changes to 11.2. The trace shows a query similar to the following:
kkopqSkewInfo: Query:SELECT * FROM (SELECT SYS_OP_COMBINED_HASH("FK_ID"), COUNT(*) CNT, TO_CHAR("FK_ID") FROM "CBO_TEST"."T_2" SAMPLE(0.275000) GROUP BY "FK_ID" ORDER BY CNT DESC) WHERE ROWNUM <= 1The query uses a SYS_OP_COMBINED_HASH expression which seems to suggest that it might support multi-column joins in the future, however a quick test showed that multi-column joins seem to disable the feature at present. The "ROWNUM" restriction depends on the number of skewed values determined previously - in my case a single one: This means that the number of skewed values handled depends on the information extracted from the histogram. Notice the rather low sample size (0.275 percent). Interestingly in my case, since the underlying table was marked PARALLEL, the recursive query actually ran parallel. The next line in the trace file shows the result of the query:
skewHashVal:1049436110058863352 count:2906 to_charVal:1These actual values determined by the query are then "hard-coded" into the cursor - if you update the actual table data (re-map the popular values to different (popular) values) and execute the query without re-optimization the skew detection doesn't work at runtime - it simply doesn't find the values stored in the cursor. Note that the optimizer simply takes as actual values whatever is returned by the recursive query - so there is a slight possibility of the query identifying the "wrong" values, but that's very unlikely for popular values that really make a difference for the data distribution. Of course the query could return completely different data if the object statistics do not reflect the actual data in the table. The actual behaviour at execution time then looks like this: The HYBRID HASH distribution of the build row source (so the other row source of the join) will check the actual values to distribute against the hard-coded values in the cursor. If there is match the value will be distributed via BROADCAST to all receiving Parallel Execution Servers, all non-matching values will be distributed by HASH. The HYBRID HASH distribution of the probe row source will check the actual values to distribute against the hard-coded values in the cursor. If there is a match the values will be distributed using a ROUND-ROBIN / RANDOM distribution, all non-matching values will be distributed by HASH. We can see this confirmed by using slight variations of a simple test case using different data pattern and degrees of parallelism. Here is the test case setup:
create table t_1 compress as select /*+ use_nl(a b) */ rownum as id , rpad('x', 100) as filler from (select /*+ cardinality(1e5) */ * from dual connect by level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b ; exec dbms_stats.gather_table_stats(null, 't_1') create table t_2 compress as select rownum as id -- Use either one or two popular values --, case when rownum <= 12e5 then 1 else mod(rownum, 2e6) + 1 end as fk_id , case when rownum <= 6e5 then 1 when rownum <= 12e5 then 2 else mod(rownum, 2e6) + 1 end as fk_id , rpad('x', 100) as filler from (select /*+ cardinality(1e5) */ * from dual connect by level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b ; --exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1', no_invalidate=>false) exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1 for columns fk_id size 254', no_invalidate=>false) -- Use either a degree of 4 or 8 --alter table t_1 parallel 4; alter table t_1 parallel 8; --alter table t_2 parallel 4; alter table t_2 parallel 8;The variations are either one or two popular values in the FK_ID join column of T_2 and different parallel degrees. This is our test query:
select count(t_2_filler) from ( select /*+ monitor leading(t_1 t_2) use_hash(t_2) no_swap_join_inputs(t_2) pq_distribute(t_2 hash hash) */ t_1.id as t_1_id , t_1.filler as t_1_filler , t_2.id as t_2_id , t_2.filler as t_2_filler from t_1 , t_2 where t_2.fk_id = t_1.id and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') );Using the following query on V$PQ_TQSTAT (use SQL*Plus to get the formatting shown) after running above test query:
break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup compute sum label Total of num_rows on server_type select /*dfo_number , */tq_id , cast(server_type as varchar2(10)) as server_type , instance , cast(process as varchar2(8)) as process , num_rows , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%" , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph , round(bytes / 1024 / 1024) as mb , round(bytes / nullif(num_rows, 0)) as "bytes/row" from v$pq_tqstat order by dfo_number , tq_id , server_type desc , instance , process ;we get the following results (One popular value, DOP = 4):
TQ_ID SERVER_TYP INSTANCE PROCESS NUM_ROWS % GRAPH MB bytes/row ---------- ---------- ---------- -------- ---------- ---------- ---------- ---------- ---------- 0 Producer 1 P004 504728 25 ########## 52 109 P005 508088 25 ########## 53 109 P006 497226 25 ########## 52 109 P007 489961 24 ########## 51 109 ********** ********** ---------- Total 2000003 Consumer 1 P000 499615 25 ########## 52 109 P001 500735 25 ########## 52 109 P002 499522 25 ########## 52 109 P003 500131 25 ########## 52 109 ********** ********** ---------- Total 2000003 1 Producer 1 P004 484704 24 ######### 50 108 P005 521859 26 ########## 53 107 P006 480692 24 ######### 49 108 P007 512745 26 ########## 52 107 ********** ********** ---------- Total 2000000 Consumer 1 P000 500245 25 ########## 51 107 P001 500104 25 ########## 51 107 P002 499437 25 ########## 51 107 P003 500214 25 ########## 51 107 ********** ********** ---------- Total 2000000 2 Producer 1 P000 1 25 ########## 0 36 P001 1 25 ########## 0 36 P002 1 25 ########## 0 36 P003 1 25 ########## 0 36 ********** ********** ---------- Total 4 Consumer 1 QC 4 100 ########## 0 36 ********** ********** ---------- Total 4Two popular values, DOP = 4 (just showing the TQ_ID = 0 Producer part here):
TQ_ID SERVER_TYP INSTANCE PROCESS NUM_ROWS % GRAPH MB bytes/row
---------- ---------- ---------- -------- ---------- ---------- ---------- ---------- ----------
0 Producer 1 P004 497226 25 ########## 52 109
P005 496679 25 ########## 51 109
P006 508088 25 ########## 53 109
P007 498013 25 ########## 52 109
********** ********** ----------
Total 2000006
One popular value, DOP = 8:
TQ_ID SERVER_TYP INSTANCE PROCESS NUM_ROWS % GRAPH MB bytes/row
---------- ---------- ---------- -------- ---------- ---------- ---------- ---------- ----------
0 Producer 1 P008 254050 13 ########## 26 108
P009 255724 13 ########## 26 108
P010 249902 12 ########## 26 109
P011 249004 12 ########## 26 109
P012 250684 13 ########## 26 109
P013 252364 13 ########## 26 109
P014 242635 12 ######### 25 109
P015 245644 12 ########## 26 109
********** ********** ----------
Total 2000007
Two popular values, DOP = 8:
TQ_ID SERVER_TYP INSTANCE PROCESS NUM_ROWS % GRAPH MB bytes/row
---------- ---------- ---------- -------- ---------- ---------- ---------- ---------- ----------
0 Producer 1 P008 253262 13 ########## 26 109
P009 252364 13 ########## 26 109
P010 245644 12 ########## 26 109
P011 255737 13 ########## 26 108
P012 245644 12 ########## 26 109
P013 254044 13 ########## 26 109
P014 254044 13 ########## 26 109
P015 239275 12 ######### 25 109
********** ********** ----------
Total 2000014
So each popular value is duplicated as many times as there are Parallel Execution Servers to distribute to.
The round-robin / random distribution of the popular values from the probe row source then ensure that the data / work distribution isn't affected by the skewed value distribution.
The feature adds another hint to the outline which is PQ_SKEW and uses the alias of the table being joined, so for my test query above the hint would be PQ_SKEW(t_2) (or the more verbose variant using the query block name / alias notation). However the hint cannot be used to force the skew handling if not all of above prerequisites are met except the histogram on the join expression.
If there is no histogram but the PQ_SKEW hint is used and all other prerequisites are met, then the optimizer fires "blindly" the recursive query to identify skewed values. Interestingly the ROWNUM predicate that limits the number of skewed values returned by the query is then equal to the parallel degree - so at a degree of 4 the query will be limited with ROWNUM <= 4.
There is an inverse hint NO_PQ_SKEW that can be used to prevent the skew handling.
In principle the same question arises as for the "PQ_REPLICATE" hint - why was an additional hint added at all? The PQ_DISTRIBUTE hint could be extended to support for example an additional SKEW SKEW distribution method, like the PQ_REPLICATE hint could be covered by a NONE REPLICATE / REPLICATE NONE distribution method. May be both hints are planned to apply to more than just join distributions and that is the reason for the separate hints, I don't know. For skew handling there is another new parameter that is called "_px_filter_skew_handling", so may be in future skew can also be handled by the new parallel FILTER operation, another new feature I hope to cover in an upcoming post.
Labels:
12.1.0.1,
12cR1,
NO_PQ_SKEW,
Parallel Execution,
PQ_SKEW,
skew
Monday, April 14, 2014
Parallel Execution Skew – Demonstrating Skew
This is just a short notice that the next part of the mini-series "Parallel Execution Skew" is published at AllThingsOracle.com
Labels:
AllThingsOracle,
Parallel Execution,
series
Sunday, April 6, 2014
Analysing Parallel Execution Skew - Without Diagnostics / Tuning Pack License
This is the third part of the video tutorial "Analysing Parallel
Execution Skew". In this part I show how to analyse a parallel SQL execution regarding Parallel Execution Skew.
If you don't have a Diagnostics / Tuning Pack license the options you have for doing that are quite limited, and the approach, as demonstrated in the tutorial, has several limitations and shortcomings.
Here is a link to the video on my Youtube channel.
If you want to reproduce or play around with the examples shown in the tutorial here is the script for creating the tables and running the queries / DML commands used in the tutorial. A shout goes out to Christo Kutrovsky at Pythian who I think was the one who inspired the beautified version on V$PQ_TQSTAT.
If you don't have a Diagnostics / Tuning Pack license the options you have for doing that are quite limited, and the approach, as demonstrated in the tutorial, has several limitations and shortcomings.
Here is a link to the video on my Youtube channel.
If you want to reproduce or play around with the examples shown in the tutorial here is the script for creating the tables and running the queries / DML commands used in the tutorial. A shout goes out to Christo Kutrovsky at Pythian who I think was the one who inspired the beautified version on V$PQ_TQSTAT.
--------------------- -- Links for S-ASH -- --------------------- -- -- http://www.perfvision.com/ash.php -- http://www.pythian.com/blog/trying-out-s-ash/ -- http://sourceforge.net/projects/orasash/files/v2.3/ -- http://sourceforge.net/projects/ashv/ --------------------- -- Table creation set echo on timing on time on drop table t_1; purge table t_1; drop table t_2; purge table t_2; drop table t_1_part; purge table t_1_part; drop table t_2_part; purge table t_2_part; drop table t1; purge table t1; drop table t2; purge table t2; drop table t3; purge table t3; drop table t4; purge table t4; drop table t5; purge table t5; drop table x; purge table x; create table t1 as select /*+ use_nl(a b) */ (rownum * 2) as id , rownum as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000) */ * from dual connect by level <= 1000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't1') alter table t1 cache; create table t2 compress as select (rownum * 2) + 1 as id , mod(rownum, 2000) + 1 as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000000) */ * from dual connect by level <= 1000000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't2') alter table t2 cache; create table t3 as select /*+ use_nl(a b) */ (rownum * 2) as id , rownum as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000) */ * from dual connect by level <= 1000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't3') alter table t3 cache; create table t4 compress as select (rownum * 2) + 1 as id , mod(rownum, 2000) + 1 as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000000) */ * from dual connect by level <= 1000000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't4') alter table t4 cache; create table t5 as select /*+ use_nl(a b) */ (rownum * 2) as id , rownum as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000) */ * from dual connect by level <= 1000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't5') alter table t5 cache; create table x compress as select * from t2 where 1 = 2; create unique index x_idx1 on x (id); alter table t1 parallel 2; alter table t2 parallel 2; alter table t3 parallel 15; alter table t4 parallel 15; alter table t5 parallel 15; create table t_1 compress as select /*+ use_nl(a b) */ rownum as id , rpad('x', 100) as filler from (select /*+ cardinality(1e5) */ * from dual connect by level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b ; exec dbms_stats.gather_table_stats(null, 't_1') create table t_2 compress as select rownum as id , case when rownum <= 5e5 then mod(rownum, 2e6) + 1 else 1 end as fk_id_skew , rpad('x', 100) as filler from (select /*+ cardinality(1e5) */ * from dual connect by level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b ; exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1', no_invalidate=>false) alter table t_1 parallel 8 cache; alter table t_2 parallel 8 cache; create table t_1_part partition by hash(id) partitions 8 compress as select /*+ use_nl(a b) */ rownum as id , rpad('x', 100) as filler from (select /*+ cardinality(1e5) */ * from dual connect by level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b ; exec dbms_stats.gather_table_stats(null, 't_1_part') create table t_2_part partition by hash(fk_id_skew) partitions 8 compress as select rownum as id , case when rownum <= 5e5 then mod(rownum, 2e6) + 1 else 1 end as fk_id_skew , rpad('x', 100) as filler from (select /*+ cardinality(1e5) */ * from dual connect by level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b ; exec dbms_stats.gather_table_stats(null, 't_2_part', method_opt=>'for all columns size 1', no_invalidate=>false) alter table t_1_part parallel 8 cache; alter table t_2_part parallel 8 cache; --------------------------------------------------------------- -- Single DFO tree (with Parallel Execution Skew), many DFOs -- --------------------------------------------------------------- set echo on timing on time on verify on define num_cpu = "14" alter session set workarea_size_policy = manual; alter session set sort_area_size = 200000000; alter session set sort_area_size = 200000000; alter session set hash_area_size = 200000000; alter session set hash_area_size = 200000000; select max(t1_id) , max(t1_filler) , max(t2_id) , max(t2_filler) , max(t3_id) , max(t3_filler) from ( select /*+ monitor no_merge no_merge(v_1) no_merge(v_5) parallel(t1 &num_cpu) PQ_DISTRIBUTE(T1 HASH HASH) PQ_DISTRIBUTE(V_5 HASH HASH) leading (v_1 v_5 t1) use_hash(v_1 v_5 t1) swap_join_inputs(t1) */ t1.id as t1_id , regexp_replace(v_5.t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t1_filler , v_5.* from ( select /*+ parallel(t2 &num_cpu) parallel(t3 &num_cpu) leading(t3 t2) use_hash(t3 t2) swap_join_inputs(t2) PQ_DISTRIBUTE(T2 HASH HASH) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id2 = t2.id2 (+) and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(t3.id2, 3) = 0 ) v_1 , ( select /*+ parallel(t2 &num_cpu) parallel(t3 &num_cpu) leading(t3 t2) use_hash(t3 t2) swap_join_inputs(t2) PQ_DISTRIBUTE(T2 HASH HASH) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id = t2.id (+) and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(t3.id2, 3) = 0 ) v_5 , t1 where v_1.t3_id = v_5.t3_id and v_5.t2_id2 = t1.id2 (+) + 2001 and regexp_replace(v_5.t3_filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t1.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') ) ; break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup -- compute sum label Total of num_rows on server_type select /*dfo_number , */tq_id , cast(server_type as varchar2(10)) as server_type , instance , cast(process as varchar2(8)) as process , num_rows , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%" , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph , round(bytes / 1024 / 1024) as mb , round(bytes / nullif(num_rows, 0)) as "bytes/row" from v$pq_tqstat order by dfo_number , tq_id , server_type desc , instance , process ; --------------------------------------------------------------------------------------------------- -- Same statement with Parallel TEMP TABLE TRANSFORMATION, V$PQ_TQSTAT shows useless information -- --------------------------------------------------------------------------------------------------- set echo on timing on time on verify on define num_cpu = "14" alter session set workarea_size_policy = manual; alter session set sort_area_size = 200000000; alter session set sort_area_size = 200000000; alter session set hash_area_size = 200000000; alter session set hash_area_size = 200000000; with result as ( select /*+ materialize monitor no_merge no_merge(v_1) no_merge(v_5) parallel(t1 &num_cpu) PQ_DISTRIBUTE(T1 HASH HASH) PQ_DISTRIBUTE(V_1 HASH HASH) PQ_DISTRIBUTE(V_5 HASH HASH) leading (v_1 v_5 t1) use_hash(v_1 v_5 t1) swap_join_inputs(t1) */ t1.id as t1_id , regexp_replace(v_5.t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t1_filler , v_5.* from ( select /*+ parallel(t2 &num_cpu) parallel(t3 &num_cpu) leading(t3 t2) use_hash(t3 t2) swap_join_inputs(t2) PQ_DISTRIBUTE(T2 HASH HASH) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id2 = t2.id2 (+) and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(t3.id2, 3) = 0 ) v_1 , ( select /*+ parallel(t2 &num_cpu) parallel(t3 &num_cpu) leading(t3 t2) use_hash(t3 t2) swap_join_inputs(t2) PQ_DISTRIBUTE(T2 HASH HASH) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id = t2.id (+) and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(t3.id2, 3) = 0 ) v_5 , t1 where v_1.t3_id = v_5.t3_id and v_5.t2_id2 = t1.id2 (+) + 2001 and regexp_replace(v_5.t3_filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t1.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') ) select max(t1_id), max(t1_filler), max(t2_id), max(t2_filler), max(t3_id), max(t3_filler) from result; break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup -- compute sum label Total of num_rows on server_type select /*dfo_number , */tq_id , cast(server_type as varchar2(10)) as server_type , instance , cast(process as varchar2(8)) as process , num_rows , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%" , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph , round(bytes / 1024 / 1024) as mb , round(bytes / nullif(num_rows, 0)) as "bytes/row" from v$pq_tqstat order by dfo_number , tq_id , server_type desc , instance , process ; -------------------------------------------------------------------------------------------------- -- This construct results in misleading information from V$PQ_TQSTAT (actually a complete mess) -- -------------------------------------------------------------------------------------------------- set echo on timing on time on alter session enable parallel dml; truncate table x; insert /*+ append parallel(x 4) */ into x select /*+ leading(v1 v2) optimizer_features_enable('11.2.0.1') */ v_1.id , v_1.id2 , v_1.filler from ( select id , id2 , filler from ( select /*+ parallel(t2 4) no_merge */ rownum as id , t2.id2 , t2.filler from t2 where mod(t2.id2, 3) = 0 and regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') ) v1 ) v_1 , ( select id , id2 , filler from ( select /*+ parallel(t2 8) no_merge */ rownum as id , t2.id2 , t2.filler from t2 where mod(t2.id2, 3) = 0 and regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') ) v2 ) v_2 where v_1.id = v_2.id and v_1.filler = v_2.filler ; -- Parallel DML requires a COMMIT before querying V$PQ_TQSTAT commit; break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup compute sum label Total of num_rows on server_type select dfo_number , tq_id , cast(server_type as varchar2(10)) as server_type , instance , cast(process as varchar2(8)) as process , num_rows , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%" , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph , round(bytes / 1024 / 1024) as mb , round(bytes / nullif(num_rows, 0)) as "bytes/row" from v$pq_tqstat order by dfo_number , tq_id , server_type desc , instance , process ; ---------------------------------------------------------------------- -- Single DFO tree (with Parallel Execution Skew, almost no impact) -- ---------------------------------------------------------------------- set echo on timing on time on alter session set workarea_size_policy = manual; alter session set sort_area_size = 500000000; alter session set sort_area_size = 500000000; alter session set hash_area_size = 500000000; alter session set hash_area_size = 500000000; select /*+ leading(v1) use_hash(t_1) no_swap_join_inputs(t_1) pq_distribute(t_1 hash hash) */ max(t_1.filler) , max(v1.t_1_filler) , max(v1.t_2_filler) from t_1 , ( select /*+ no_merge leading(t_1 t_2) use_hash(t_2) no_swap_join_inputs(t_2) pq_distribute(t_2 hash hash) */ t_1.id as t_1_id , t_1.filler as t_1_filler , t_2.id as t_2_id , t_2.filler as t_2_filler from t_1 , t_2 where t_2.fk_id_skew = t_1.id ) v1 where v1.t_2_id = t_1.id and regexp_replace(v1.t_2_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and regexp_replace(v1.t_2_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') ; break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup -- compute sum label Total of num_rows on server_type select /*dfo_number , */tq_id , cast(server_type as varchar2(10)) as server_type , instance , cast(process as varchar2(8)) as process , num_rows , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%" , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph , round(bytes / 1024 / 1024) as mb , round(bytes / nullif(num_rows, 0)) as "bytes/row" from v$pq_tqstat order by dfo_number , tq_id , server_type desc , instance , process ; -------------------------------------------------------------------------------------------------------------------------------- -- Full Partition Wise Join with partition skew - V$PQ_TQSTAT is of no help, since no redistribution takes place (single DFO) -- -------------------------------------------------------------------------------------------------------------------------------- set echo on timing on time on alter session set workarea_size_policy = manual; alter session set sort_area_size = 500000000; alter session set sort_area_size = 500000000; alter session set hash_area_size = 500000000; alter session set hash_area_size = 500000000; select count(t_2_filler) from ( select /*+ monitor leading(t_1 t_2) use_hash(t_2) no_swap_join_inputs(t_2) pq_distribute(t_2 none none) */ t_1.id as t_1_id , t_1.filler as t_1_filler , t_2.id as t_2_id , t_2.filler as t_2_filler from t_1_part t_1 , t_2_part t_2 where t_2.fk_id_skew = t_1.id and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') ); break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup -- compute sum label Total of num_rows on server_type select /*dfo_number , */tq_id , cast(server_type as varchar2(10)) as server_type , instance , cast(process as varchar2(8)) as process , num_rows , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%" , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph , round(bytes / 1024 / 1024) as mb , round(bytes / nullif(num_rows, 0)) as "bytes/row" from v$pq_tqstat order by dfo_number , tq_id , server_type desc , instance , process ;
Labels:
11g,
11gR2,
12cR1,
Parallel Execution,
series,
Video Tutorial
Saturday, April 5, 2014
Analysing Parallel Execution Skew - Data Flow Operations (DFOs) And DFO Trees
This is the second part of the video tutorial "Analysing Parallel Execution Skew". In this part I introduce the concept of "Data Flow Operations (DFOs)" and "DFO Trees", which is what a Parallel Execution plan is made of. DFOs / DFO Trees are specific to Parallel Execution and don't have any counterpart in a serial execution plan.
Understanding the implications of DFOs / DFO Trees is important as prerequisite for understanding some of the effects shown in the later parts of the video tutorial, hence I covered this as a separate topic.
Note that this tutorial also demonstrates some new 12c features regarding Parallel Execution, in particular how Oracle 12c now lifts many of the previous limitations that lead to the generation of multiple DFO Trees.
Here is a link to the video on my Youtube channel.
If you want to reproduce and play around with the DFO Tree variations shown in the tutorial here is the script for creating the tables and running the queries / DML commands used in the tutorial:
Understanding the implications of DFOs / DFO Trees is important as prerequisite for understanding some of the effects shown in the later parts of the video tutorial, hence I covered this as a separate topic.
Note that this tutorial also demonstrates some new 12c features regarding Parallel Execution, in particular how Oracle 12c now lifts many of the previous limitations that lead to the generation of multiple DFO Trees.
Here is a link to the video on my Youtube channel.
If you want to reproduce and play around with the DFO Tree variations shown in the tutorial here is the script for creating the tables and running the queries / DML commands used in the tutorial:
-- Table creation set echo on timing on time on drop table t1; purge table t1; drop table t2; purge table t2; drop table t3; purge table t3; drop table t4; purge table t4; drop table t5; purge table t5; drop table x; purge table x; create table t1 as select /*+ use_nl(a b) */ (rownum * 2) as id , rownum as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000) */ * from dual connect by level <= 1000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't1') alter table t1 cache; create table t2 compress as select (rownum * 2) + 1 as id , mod(rownum, 2000) + 1 as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000000) */ * from dual connect by level <= 1000000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't2') alter table t2 cache; create table t3 as select /*+ use_nl(a b) */ (rownum * 2) as id , rownum as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000) */ * from dual connect by level <= 1000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't3') alter table t3 cache; create table t4 compress as select (rownum * 2) + 1 as id , mod(rownum, 2000) + 1 as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000000) */ * from dual connect by level <= 1000000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't4') alter table t4 cache; create table t5 as select /*+ use_nl(a b) */ (rownum * 2) as id , rownum as id2 , rpad('x', 100) as filler from (select /*+ cardinality(1000) */ * from dual connect by level <= 1000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b ; exec dbms_stats.gather_table_stats(null, 't5') alter table t5 cache; create table x compress as select * from t2 where 1 = 2; create unique index x_idx1 on x (id); alter table t1 parallel 2; alter table t2 parallel 2; alter table t3 parallel 15; alter table t4 parallel 15; alter table t5 parallel 15; --------------------------------------------------------------- -- Single DFO tree (with Parallel Execution Skew), many DFOs -- --------------------------------------------------------------- set echo on timing on time on verify on define num_cpu = "15" select max(t1_id) , max(t1_filler) , max(t2_id) , max(t2_filler) , max(t3_id) , max(t3_filler) from ( select /*+ monitor no_merge no_merge(v_1) no_merge(v_5) parallel(t1 &num_cpu) PQ_DISTRIBUTE(T1 HASH HASH) PQ_DISTRIBUTE(V_5 HASH HASH) leading (v_1 v_5 t1) use_hash(v_1 v_5 t1) swap_join_inputs(t1) */ t1.id as t1_id , regexp_replace(v_5.t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t1_filler , v_5.* from ( select /*+ parallel(t2 &num_cpu) parallel(t3 &num_cpu) leading(t3 t2) use_hash(t3 t2) swap_join_inputs(t2) PQ_DISTRIBUTE(T2 HASH HASH) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id2 = t2.id2 (+) and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(t3.id2, 3) = 0 ) v_1 , ( select /*+ parallel(t2 &num_cpu) parallel(t3 &num_cpu) leading(t3 t2) use_hash(t3 t2) swap_join_inputs(t2) PQ_DISTRIBUTE(T2 HASH HASH) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id = t2.id (+) and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(t3.id2, 3) = 0 ) v_5 , t1 where v_1.t3_id = v_5.t3_id and v_5.t2_id2 = t1.id2 (+) + 2001 and regexp_replace(v_5.t3_filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t1.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') ) ; --------------------------------------------------------------------------------------------------------------------------------------------------- -- Multiple DFO trees no parent / child (with different DOPs), separate slave sets, one active after the other (12.1: Still multiple DFO trees) -- --------------------------------------------------------------------------------------------------------------------------------------------------- set echo on timing on time on verify on with a as ( select /*+ materialize monitor no_merge */ t1.id as t1_id , regexp_replace(t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t1_filler , v1.* from ( select /*+ no_merge pq_distribute(t3 hash hash) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id2 = t2.id and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(t3.id2, 2) = 0 ) v1 , t1 where v1.t2_id2 = t1.id2 ), b as ( select /*+ materialize monitor no_merge */ t1.id as t1_id , regexp_replace(t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t1_filler , v1.* from ( select /*+ no_merge pq_distribute(t3 hash hash) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t3 t2 , t4 t3 where t3.id2 = t2.id and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(t3.id2, 2) = 0 ) v1 , t5 t1 where v1.t2_id2 = t1.id2 ) select max(t1_id), max(t1_filler), max(t2_id), max(t2_filler), max(t3_id), max(t3_filler) from ( select /*+ no_merge */ a.t1_id , a.t1_filler , a.t2_id , a.t2_filler , a.t3_id , regexp_replace(a.t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t3_filler from a , b where a.t3_id = b.t3_id and regexp_replace(a.t3_filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(b.t3_filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(a.t1_id, 4) = 0 and mod(b.t1_id, 4) = 0 ) ; ------------------------------------------------------------------------------------------------------------------------------- -- Multiple DFO trees parent / child (with different DOPs), separate slave sets, concurrently active (12.1: Single DFO tree) -- ------------------------------------------------------------------------------------------------------------------------------- set echo on timing on time on alter session enable parallel dml; truncate table x; insert /*+ append parallel(x 4) */ into x select /*+ leading(v1 v2) */ v_1.id , v_1.id2 , v_1.filler from ( select id , id2 , filler from ( select /*+ parallel(t2 4) no_merge */ rownum as id , t2.id2 , t2.filler from t2 where mod(t2.id2, 3) = 0 and regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') ) v1 ) v_1 , ( select id , id2 , filler from ( select /*+ parallel(t2 8) no_merge */ rownum as id , t2.id2 , t2.filler from t2 where mod(t2.id2, 3) = 0 and regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') ) v2 ) v_2 where v_1.id = v_2.id and v_1.filler = v_2.filler ; commit; -------------------------------------------------------------------------------------------------------------------------------------- -- Multiple DFO trees parent / child (with different DOPs), separate slave sets, *not* concurrently active (12.1: Single DFO tree) -- -------------------------------------------------------------------------------------------------------------------------------------- set echo on timing on time on alter session enable parallel dml; truncate table x; insert /*+ append parallel(x 4) */ into x select v1.* from ( select /*+ parallel(t2 4) */ lag(t2.id) over (order by regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')) as id , t2.id2 , regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as filler from t2 where mod(t2.id2, 3) = 0 ) v1 , ( select /*+ parallel(t2 8) */ lag(id) over (order by regexp_replace(filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')) as id , id2 , regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as filler from t2 where mod(t2.id2, 3) = 0 ) v2 where v1.id = v2.id and v1.filler = v2.filler ; commit; ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- -- Multiple DFO trees, no parent/child, multiple DFO tree starts, no separate slave sets, concurrently active (12.1: Single DFO tree, new parallel FILTER/SUBQUERY) -- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- set echo on timing on time on select /*+ no_merge(x) */ * from ( select v1.filler , (select /*+ parallel(x 2) */ id from t2 x where x.id = v1.id) as id , (select /*+ parallel(x 2) */ id2 from t2 x where x.id = v1.id) as id2 from ( select /*+ parallel(t2 4) */ t2.id , t2.id2 , t2.filler from t2 ) v1 , ( select /*+ parallel(t2 8) */ t2.id , t2.id2 , t2.filler from t2 ) v2 where v1.id = v2.id and v1.filler = v2.filler ) x where rownum <= 100 ; ---------------------------------------------------------------------------------------------------------------------------------------------------------- -- Multiple DFO trees, no parent/child, multiple DFO tree starts, separate slave sets, concurrently active (12.1: Single DFO tree, new parallel FILTER) -- ---------------------------------------------------------------------------------------------------------------------------------------------------------- set echo on timing on time on select /*+ parallel(t2 8) parallel(t3 8) leading(t3 t2) use_hash(t3 t2) swap_join_inputs(t2) PQ_DISTRIBUTE(T2 HASH HASH) --PQ_FILTER(@"SEL$1" NONE) */ t2.id as t2_id , t2.filler as t2_filler , t2.id2 as t2_id2 , t3.id as t3_id , t3.filler as t3_filler from t1 t2 , t2 t3 where t3.id2 = t2.id2 (+) and regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') and mod(t3.id2, 3) = 0 and not exists (select /*+ no_unnest parallel(t2 2) */ null from t2 x where x.id2 = t2.id2) ; ------------------------------------------------------------------------------------------------------------------------------------------------------------- -- Multiple DFO trees, no parent/child, multiple DFO tree starts, separate slave sets, concurrently active (12.1: Still multiple DFO trees, serial FILTER) -- ------------------------------------------------------------------------------------------------------------------------------------------------------------- set echo on timing on time on alter session enable parallel dml; truncate table x; insert /*+ append parallel(x 4) */ into x select /*+ leading(v1 v2) */ v_1.id , v_1.id2 , v_1.filler from ( select id , id2 , filler from ( select /*+ parallel(t2 4) no_merge */ rownum as id , t2.id2 , t2.filler from t2 where mod(t2.id2, 3) = 0 and regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') ) v1 ) v_1 , ( select id , id2 , filler from ( select /*+ parallel(t2 8) no_merge */ rownum as id , t2.id2 , t2.filler from t2 where mod(t2.id2, 3) = 0 and regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') ) v2 ) v_2 where v_1.id = v_2.id and v_1.filler = v_2.filler and not exists (select /*+ no_unnest parallel(y 2) */ null from t2 y where y.id2 = v_1.id2) ; commit;
Labels:
11g,
11gR2,
12cR1,
Parallel Execution,
series,
Video Tutorial
Subscribe to:
Posts (Atom)