Showing posts with label 11g. Show all posts
Showing posts with label 11g. Show all posts

Wednesday, June 29, 2016

New Version Of XPLAN_ASH Utility

A new version 4.23 of the XPLAN_ASH utility is available for download.

As usual the latest version can be downloaded here.

This version comes only with minor changes, see the change log below.

Here are the notes from the change log:

- Finally corrected the very old and wrong description of "wait times" in the script comments, where it was talking about "in-flight" wait events but that is not correct. ASH performs a "fix-up" of the last 255 samples or so and updates them with the time waited, so these wait events are not "in-flight"

- Removed some of the clean up code added in 4.22 to the beginning of the script, because it doesn't really help much but spooled script output always contained these error messages about non-existent column definitions being cleared

- The "Concurrent I/O" sections will now also be skipped in LIMITED_ASH mode

- Some more fixes to the I/O figures in the "Activity Timeline based on ASH" - the spreading introduced in 4.22 needed some further refinement (see 4.22 change log for more details)

Monday, December 28, 2015

New Version Of XPLAN_ASH Utility

A new version 4.22 of the XPLAN_ASH utility is available for download.

As usual the latest version can be downloaded here.

This version primarily addresses an issue with 12c - if the HIST mode got used to pull ASH information from AWR in 12c it turned out that Oracle forgot to add the new "DELTA_READ_MEM_BYTES" columns to DBA_HIST_ACTIVE_SESS_HISTORY - although it got officially added to V$ACTIVE_SESSION_HISTORY in 12c. So now I had to implement several additional if/then/else constructs to the script to handle this inconsistency. It's the first time that the HIST view doesn't seem to reflect all columns from the V$ view - very likely an oversight rather than by design I assume.

Apart from that the I/O figures (Read Bytes / Write Bytes etc.) in the "Activity Timeline" make more sense for those cases where a process hasn't been sampled for several sample points (see below for more details).

Also in case an execution plan could not be found it is now made more obvious with a corresponding message that you might be able to pull the execution plan from AWR by using different ASH modes (MIXED / HIST).

Here are the notes from the change log:

- Fixed a funny bug that in 12c they have forgotton to add the DELTA_READ_MEM_BYTES to DBA_HIST_ACTIVE_SESS_HISTORY, so in HIST mode with 12c prior XPLAN_ASH versions could error out with invalid column name

- Change the way the I/O figures are treated in the "Activity Timeline based on ASH". Now the I/O per second is spread over the (previous) samples covered by DELTA_TIME. This should give a smoother representation of the I/O performed and much closer to what you see in Real-Time SQL Monitoring reports. The difference to prior versions is only visible in cases where a session wasn't sampled for quite a while and hence has a DELTA_TIME spanning multiple previous sample points. This also means that the I/O related columns in the "Activity Timeline based on ASH" now show only the PER SECOND values, no longer to the totals like prior versions

- Added a SET NULL "" in the configuration and initialization section for SQL*Plus environments that use a non-default SET NULL setting. This screwed up some internal switches so that XPLAN_ASH for example thought it's running in a S-ASH repository

- Added a note to the end of the output if no execution plan could be found and falling back to retrieving plan operation details from ASH. Also added the note to use MIXED or HIST ASH source option if no execution plan could be found in CURR mode, so execution plan has been purged from Shared Pool in the meanwhile

- Cloned the "cleanup" section from the end to the beginning of the script to ensure no current SQL*Plus environment settings influence the script execution. This is particularly relevant if the script execution gets cancelled before the final cleanup section is reached or some other, previous scripts left a mess behind

Sunday, August 16, 2015

Parallel Projection

A recent case at a client reminded me of something that isn't really new but not so well known - Oracle by default performs evaluation at the latest possible point in the execution plan.

So if you happen to have expressions in the projection of a simple SQL statement that runs parallel it might be counter-intuitive that by default Oracle won't evaluate the projection in the Parallel Slaves but in the Query Coordinator - even if it was technically possible - because the latest possible point is the SELECT operation with the ID = 0 of the plan, which is always performed by the Query Coordinator.

Of course, if you make use of expressions that can't be evaluated in parallel or aren't implemented for parallel evaluation, then there is no other choice than doing this in the Query Coordinator.

The specific case in question was a generic export functionality that allowed exporting report results to some CSV or Excel like format, and some of these reports had a lot of rows and complex - in that case CPU intensive - expressions in their projection clause.

When looking at the run time profile of such an export query it became obvious that although it was a (very simple) parallel plan, all of the time was spent in the Query Coordinator, effectively turning this at runtime into a serial execution.

This effect can be reproduced very easily:
create table t_1
compress
as
select  /*+ use_nl(a b) */
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_1', method_opt=>'for all columns size 1')

alter table t_1 parallel cache;

-- Run some CPU intensive expressions in the projection 
-- of a simple parallel Full Table Scan
set echo on timing on time on

set autotrace traceonly statistics

set arraysize 500

select
        regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as some_cpu_intensive_exp1
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as some_cpu_intensive_exp2
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') as some_cpu_intensive_exp3
from t_1
;

-- The plan is clearly parallel
--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |  2000K|   192M|   221   (1)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| T_1      |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

-- But the runtime profile looks more serial
-- although the Parallel Slaves get used to run the Full Table Scan
-- All time spent in the operation ID = 0
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Operation            | Name     | Execs | A-Rows| ReadB | ReadReq | Start | Dur(T)| Dur(A)| Time Active Graph    | Parallel Distribution ASH                                                     | Parallel Execution Skew ASH      | Activity Graph ASH          | Top 5 Activity ASH  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     | SELECT STATEMENT     |          |    5  | 2000K |       |         |     3 |   136 |   120 | #################### |   1:sqlplus.exe(120)[2000K],P008(0)[0],P009(0)[0],P00A(0)[0],P00B(0)[0]       | ################################ | @@@@@@@@@@@@@@@@@@@ ( 98%)  | ON CPU(120)         |
|   1 |   0 |  PX COORDINATOR      |          |    5  | 2000K |       |         |   119 |     1 |     1 |                 #    |   1:sqlplus.exe(1)[2000K],P008(0)[0],P009(0)[0],P00A(0)[0],P00B(0)[0]         |                                  |                     ( .8%)  | ON CPU(1)           |
|   2 |   1 |   PX SEND QC (RANDOM)| :TQ10000 |    4  | 2000K |       |         |    66 |    11 |     2 |          ##          |   2:P00B(1)[508K],P00A(1)[490K],P008(0)[505K],P009(0)[497K],sqlplus.exe(0)[0] |                                  |                     (1.6%)  | PX qref latch(2)    |
|   3 |   2 |    PX BLOCK ITERATOR |          |    4  | 2000K |       |         |       |       |       |                      |   0:P00B(0)[508K],P008(0)[505K],P009(0)[497K],P00A(0)[490K],sqlplus.exe(0)[0] |                                  |                             |                     |
|*  4 |   3 |     TABLE ACCESS FULL| T_1      |   52  | 2000K |   23M |     74  |       |       |       |                      |   0:P00B(0)[508K],P008(0)[505K],P009(0)[497K],P00A(0)[490K],sqlplus.exe(0)[0] |                                  |                             |                     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fortunately there is a simple and straightforward way to make use of the Parallel Slaves for evaluation of projection expressions that can be evaluated in parallel - simply add a suitable NO_MERGE hint for the query block that you want the projection to be evaluated for in the Parallel Slaves.

If you don't want to have side effects on the overall plan shape by not merging views you could always wrap the original query in an outer SELECT and not merging the now inner query block. There seems to be a rule that the projection of a view always get evaluated at the VIEW operator, and if we check the execution plan we can see that the VIEW operator is marked parallel:
set echo on timing on time on

set autotrace traceonly statistics

set arraysize 500

select /*+ no_merge(x) */ * from (
select
        regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as some_cpu_intensive_exp1
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as some_cpu_intensive_exp2
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') as some_cpu_intensive_exp3
from t_1
) x
;

-- View operator is marked parallel
-- This is were the projection clause of the VIEW will be evaluated
---------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |  2000K|    11G|   221   (1)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 |  2000K|    11G|   221   (1)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    VIEW               |          |  2000K|    11G|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|   4 |     PX BLOCK ITERATOR |          |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL| T_1      |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------

-- Runtime profile now shows effective usage of Parallel Slaves 
-- for doing the CPU intensive work
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Operation             | Name     | Execs | A-Rows| Start | Dur(T)| Dur(A)| Time Active Graph    | Parallel Distribution ASH                                                         | Parallel Execution Skew ASH| Activity Graph ASH          | Top 5 Activity ASH               |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     | SELECT STATEMENT      |          |    5  | 2000K |       |       |       |                      |   0:sqlplus.exe(0)[2000K],P000(0)[0],P001(0)[0],P002(0)[0],P003(0)[0]             |                            |                             |                                  |
|   1 |   0 |  PX COORDINATOR       |          |    5  | 2000K |    17 |    63 |    10 |    # ##   #    ####  |   1:sqlplus.exe(10)[2000K],P000(0)[0],P001(0)[0],P002(0)[0],P003(0)[0]            | ####                       | *                   (5.6%)  | resmgr:cpu quantum(10)           |
|   2 |   1 |   PX SEND QC (RANDOM) | :TQ10000 |    4  | 2000K |     5 |    61 |    10 | ## #  ## ## ## #     |   3:P002(5)[544K],P001(4)[487K],P000(1)[535K],P003(0)[434K],sqlplus.exe(0)[0]     | #                          |                     (5.6%)  | ON CPU(7),resmgr:cpu quantum(3)  |
|   3 |   2 |    VIEW               |          |    4  | 2000K |     2 |    82 |    69 | #################### |   4:P003(42)[434K],P001(35)[487K],P000(26)[535K],P002(22)[544K],sqlplus.exe(0)[0] | ############               | @@@@@@@@@@@@@@@@@@@ ( 70%)  | ON CPU(125)                      |
|   4 |   3 |     PX BLOCK ITERATOR |          |    4  | 2000K |       |       |       |                      |   0:P002(0)[544K],P000(0)[535K],P001(0)[487K],P003(0)[434K],sqlplus.exe(0)[0]     |                            |                             |                                  |
|*  5 |   4 |      TABLE ACCESS FULL| T_1      |   52  | 2000K |     3 |    78 |    29 | ###### ####### # ### |   4:P000(11)[535K],P002(8)[544K],P001(8)[487K],P003(7)[434K],sqlplus.exe(0)[0]    | ###                        | *****               ( 19%)  | resmgr:cpu quantum(30),ON CPU(4) |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
At runtime the duration of the query now gets reduced significantly and we can see the Parallel Slaves getting used when the VIEW operator gets evaluated. Although the overall CPU time used is similar to the previous example, the duration of the query execution is less since this CPU time is now spent in parallel in the slaves instead in the Query Coordinator.

Summary


By default Oracle performs evaluation at the latest possible point of the execution plan. Sometimes you can improve runtime by actively influencing when the projection will be evaluated by preventing view merging and introducing a VIEW operator that will be used to evaluate the projection clause.

The optimizer so far doesn't seem to incorporate such possibilities in its evaluations of possible plan shapes, so this is something you need to do manually up to and including Oracle 12c (version 12.1.0.2 as of time of writing this).

Friday, May 29, 2015

Temp Table Transformation Cardinality Estimates - 2

Continuing from the previous part - which was about the Temp Table Transformation and join cardinality estimates - using the same simple table setup here is a slight variation of the previously used query to demonstrate the potential impact on single table cardinality estimates:
explain plan for
with
cte as (
select /* inline */ id from t1 t
where 1 = 1
)
select /*+
           no_merge(a) no_merge(b)
       */ * from cte a, cte b
where a.id = b.id
and a.id > 990 and b.id > 990
;

-- 11.2.0.x Plan with TEMP transformation
--------------------------------------------------------------------------------
| Id  | Operation                  | Name                      | Rows  | Bytes |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                           |  1000 | 26000 |
|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D661C_275FD9 |       |       |
|   3 |    TABLE ACCESS FULL       | T1                        |  1000 |  4000 |
|*  4 |   HASH JOIN                |                           |  1000 | 26000 |
|*  5 |    VIEW                    |                           |  1000 | 13000 |
|   6 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D661C_275FD9 |  1000 |  4000 |
|*  7 |    VIEW                    |                           |  1000 | 13000 |
|   8 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D661C_275FD9 |  1000 |  4000 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("A"."ID"="B"."ID")
   5 - filter("A"."ID">990)
   7 - filter("B"."ID">990)

-- 11.2.0.x Plan with INLINE hint
----------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |
----------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |   260 |
|*  1 |  HASH JOIN          |      |    10 |   260 |
|   2 |   VIEW              |      |    10 |   130 |
|*  3 |    TABLE ACCESS FULL| T1   |    10 |    40 |
|   4 |   VIEW              |      |    10 |   130 |
|*  5 |    TABLE ACCESS FULL| T1   |    10 |    40 |
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("A"."ID"="B"."ID")
   3 - filter("ID">990)
   5 - filter("ID">990)
Again it's obvious that the Temp Table Transformation can have significant impact on the single table cardinality estimates.

In particular:

- Although the same filter is applied in both cases to the rowsources A and B, in case of the Temp Table Transformation it doesn't reduce the cardinality. So it's not uncommon to end up with significant cardinality overestimates in case the transformation gets used

- For Exadata environments particularly bad is that the filter isn't pushed into the TABLE ACCESS FULL operator, but only applied in the VIEW operator above, which means that it can't be offloaded - all the data needs to be sent from the Storage Cells to the Compute Nodes and filtered there. Not a very efficient way to operate on Exadata

The behaviour is still the same in 12c.

Thursday, May 7, 2015

Heuristic Temp Table Transformation - 2

Some time ago I've demonstrated the non-cost based decision for applying the temp table transformation when using CTEs (Common Table/Subquery Expressions). In this note I want to highlight another aspect of this behaviour.

Consider the following data creating a table with delibrately wide columns:
create table a
as
  select
          rownum as id
        , rownum as id2
        , rpad('x', 4000) as large_vc1
        , rpad('x', 4000) as large_vc2
        , rpad('x', 4000) as large_vc3
from
          dual
connect by
          level <= 1000
;

exec dbms_stats.gather_table_stats(null, 'a')
and this query and plans with and without the temp table transformation:
with cte
as
(
  select  /* inline */
          id
        , id2
        , large_vc1
        , large_vc2
        , large_vc3
from
          a
where
          1 = 1
)
select
        *
from
        (
          select id, count(*) from cte group by id
        ) a,
        (
          select id2, count(*) from cte group by id2
        ) b
where
        a.id = b.id2
;

-- Plan with TEMP TABLE transformation
--------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                           |  1000 | 52000 |  1341   (1)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6609_26FA32 |       |       |            |          |
|   3 |    TABLE ACCESS FULL       | A                         |  1000 |    11M|   452   (0)| 00:00:01 |
|*  4 |   HASH JOIN                |                           |  1000 | 52000 |   889   (1)| 00:00:01 |
|   5 |    VIEW                    |                           |  1000 | 26000 |   444   (1)| 00:00:01 |
|   6 |     HASH GROUP BY          |                           |  1000 |  4000 |   444   (1)| 00:00:01 |
|   7 |      VIEW                  |                           |  1000 |  4000 |   443   (0)| 00:00:01 |
|   8 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D6609_26FA32 |  1000 |    11M|   443   (0)| 00:00:01 |
|   9 |    VIEW                    |                           |  1000 | 26000 |   444   (1)| 00:00:01 |
|  10 |     HASH GROUP BY          |                           |  1000 |  4000 |   444   (1)| 00:00:01 |
|  11 |      VIEW                  |                           |  1000 |  4000 |   443   (0)| 00:00:01 |
|  12 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D6609_26FA32 |  1000 |    11M|   443   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

-- Plan with CTE inlined (turn INLINE into hint)
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  1000 | 52000 |   907   (1)| 00:00:01 |
|*  1 |  HASH JOIN           |      |  1000 | 52000 |   907   (1)| 00:00:01 |
|   2 |   VIEW               |      |  1000 | 26000 |   453   (1)| 00:00:01 |
|   3 |    HASH GROUP BY     |      |  1000 |  4000 |   453   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| A    |  1000 |  4000 |   452   (0)| 00:00:01 |
|   5 |   VIEW               |      |  1000 | 26000 |   453   (1)| 00:00:01 |
|   6 |    HASH GROUP BY     |      |  1000 |  4000 |   453   (1)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| A    |  1000 |  4000 |   452   (0)| 00:00:01 |
-----------------------------------------------------------------------------
Looking at the query and plan output the following becomes obvious:

- The mere existence of a WHERE clause, even if it is just "WHERE 1 = 1" and referencing the CTE more than once triggers the transformation (nothing new, already demonstrated in the mentioned previous note, as well as the fact that the inlined CTE variant is cheaper in cost)

- There is a huge difference between the estimated size of the TEMP TABLE and the size of the row sources when using the CTE inline

The latter is particular noteworthy: Usually Oracle is pretty clever in optimizing the projection and uses only those columns required (doesn't apply to the target expression of MERGE statements, by the way), which is reflected in the plan output for the inline CTEs - the wide columns don't matter here because they aren't referenced, although being mentioned in the CTE. But in case of the temp table transformation obviously all columns / expressions mentioned in the CTE become materialized, although not necessarily being referenced when the CTE gets used.

So it would be nice if Oracle only materialized those columns / expressions actually used.

Now you might raise the question why mention columns and expressions in the CTE that don't get used afterwards: Well, generic approaches sometimes lead to such constructs - imagine the CTE part was static, including all possible attributes, but the actual usage of the CTE can be customized by a client. In such cases where only a small part of the available attributes get actually used a temp table transformation can lead to a huge overhead in size of the generated temp table. Preventing the transformation addresses this issue, but then the inlined CTE will have to be evaluated as many times as referenced - which might not be desirable either.

Monday, April 20, 2015

Function-Based Indexes And CURSOR_SHARING = FORCE

In general it is known that Function-Based Indexes (FBIs) can no longer be used by the optimizer if the expression contains literals and CURSOR_SHARING = FORCE / SIMILAR (deprecated) turns those literals into bind variables. Jonathan Lewis described the issue quite a while ago here in detail.

In a recent OTN thread this issue was raised again, but to my surprise when I played around with a test case that mimicked the OP's problem query I found that (certain) Oracle versions have some built-in logic that enable FBI usage for certain cases where you would expect them to be not usable.

If you test the following code on versions from 10.2.0.4 (possibly earlier) up to and including version 11.2.0.3 then you'll notice some interesting details:

create table t
as
select * from all_objects;

create index t_idx on t (owner || ' ' || object_name);

exec dbms_stats.gather_table_stats(null, 't')

set echo on linesize 200 pagesize 0

alter session set cursor_sharing = force;

select /*+ full(t) */ * from t where owner || ' ' || object_name = 'BLA';

select * from table(dbms_xplan.display_cursor);

select /*+ index(t) */ * from t where owner || ' ' || object_name = 'BLA';

select * from table(dbms_xplan.display_cursor);

select /*+ index(t) */ * from t where owner || 'A' || object_name = 'BLA';

select * from table(dbms_xplan.display_cursor);
Here is the relevant output I got from 11.2.0.1 for example:
SQL> alter session set cursor_sharing = force;

Session altered.

SQL>
SQL> select /*+ full(t) */ * from t where owner || ' ' || object_name = 'BLA';

no rows selected

SQL>
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID  ar3tw7r1rvawk, child number 0
-------------------------------------
select /*+ full(t) */ * from t where owner || :"SYS_B_0" || object_name
= :"SYS_B_1"

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   284 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |     1 |   117 |   284   (2)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"||' '||"OBJECT_NAME"=:SYS_B_1)


19 rows selected.

SQL>
SQL> select /*+ index(t) */ * from t where owner || ' ' || object_name = 'BLA';

no rows selected

SQL>
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID  6kzz3vw5x8x3b, child number 0
-------------------------------------
select /*+ index(t) */ * from t where owner || :"SYS_B_0" ||
object_name = :"SYS_B_1"

Plan hash value: 470836197

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |   117 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T"."SYS_NC00016$"=:SYS_B_1)


20 rows selected.

SQL>
SQL> select /*+ index(t) */ * from t where owner || 'A' || object_name = 'BLA';

no rows selected

SQL>
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID  6kzz3vw5x8x3b, child number 1
-------------------------------------
select /*+ index(t) */ * from t where owner || :"SYS_B_0" ||
object_name = :"SYS_B_1"

Plan hash value: 3778778741

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |       |       | 52472 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T     |   724 | 84708 | 52472   (1)| 00:10:30 |
|   2 |   INDEX FULL SCAN           | T_IDX | 72351 |       |   420   (1)| 00:00:06 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"||:SYS_B_0||"OBJECT_NAME"=:SYS_B_1)


20 rows selected.
Looking at the statement text that results from "CURSOR_SHARING = force" we can spot the expected bind variables instead of the literals, and this should result in a corresponding predicate that doesn't match the FBI expression. However, when looking at the filter expression in the predicate section (when forcing a full table scan) we can spot something interesting: It still shows the literal, which doesn't correspond to the predicate of the rewritten query text.

The next execution shows that the FBI really can be used despite the bind variable replacement taking place, and the final execution shows that the cursor sharing works correctly in that sense that a new child cursor got created for the same SQL text with a different plan and different predicate section when using a different literal in the original SQL text. V$SQL_SHARED_CURSOR shows "HASH_MATCH_FAILED" which is described as "No existing child cursors have the unsafe literal bind hash values required by the current cursor", which makes sense and probably means that the corresponding bind variable is marked as "unsafe" internally.

This optimisation shows only up if there is a suitable FBI - if there's no corresponding expression the SQL text and predicate section match. Furthermore it only supports certain expressions - Jonathan's example shows that in general it's true that these rewrites prevent FBI usage. And obviously it ceases to work in 11.2.0.4 and 12c. Whether this is a bug or a feature I don't know, but since it only seems to apply to certain expressions it's probably not that relevant anyway.

As Jonathan points out in his note you can always work around the general problem by hiding the expression in a view, and since 11g of course a proper virtual column definition is the better approach, which doesn't expose this problem either.

Even better would be the proper usage of bind variables and not using forced cursor sharing, but there are still many installations out there that rely on that feature.

Monday, April 13, 2015

Combined ACCESS And FILTER Predicates - Excessive Throw-Away

Catchy title... Let's assume the following data setup:
create table t1
as
select
        rownum as id
      , 1 as id2
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1e4
;

create table t2
as
select
        rownum as id
      , 1 as id2
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1e4
;

create table t3
as
select
        rownum as id
      , 1 as id2
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1e4
;

exec dbms_stats.gather_table_stats(null, 't1')

exec dbms_stats.gather_table_stats(null, 't2')

exec dbms_stats.gather_table_stats(null, 't3')

-- Deliberately wrong order (FBI after gather stats) - the virtual columns created for this FBI don't have statistics, see below
create index t2_idx on t2 (case when id2 = 1 then id2 else 1 end, case when id2 = 2 then id2 else 1 end, filler, id);

create index t3_idx on t3 (id, filler, id2);
And the following execution plan (all results are from 12.1.0.2 but should be applicable to other versions, too):
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |  1416K|   132   (0)| 00:00:01 |
|*  1 |  HASH JOIN          |      | 10000 |  1416K|   132   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T2   | 10000 |   292K|    44   (0)| 00:00:01 |
|*  3 |   HASH JOIN         |      | 10000 |  1123K|    88   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T3   | 10000 | 70000 |    44   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T1   | 10000 |  1054K|    44   (0)| 00:00:01 |
----------------------------------------------------------------------------
How long would you expect it to run to return all rows (no tricks like expensive regular expressions or user-defined PL/SQL functions)?

Probably should take just a blink, given the tiny tables with just 10000 rows each.

However, these are the runtime statistics for a corresponding execution:
|            |            |            |
|            |DATABASE    |CPU         |
|DURATION    |TIME        |TIME        |
|------------|------------|------------|
|+0 00:00:23 |+0 00:00:23 |+0 00:00:23 |
|            |            |            |

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Ord | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows| PGA   | Start | Dur(T)| Dur(A)| Time Active Graph    | Activity Graph ASH          | Top 5 Activity ASH  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     |   6 | SELECT STATEMENT    |      |       |       |   132 (100)|          |    1  |    1  |    0  |       |       |       |                      |                             |                     |
|*  1 |   0 |   5 |  HASH JOIN          |      | 10000 |  1191K|   132   (0)| 00:00:01 |    1  |    1  | 1401K |     2 |    23 |    22 | ##### ############## | @@@@@@@@@@@@@@@@@@@ (100%)  | ON CPU(22)          |
|   2 |   1 |   1 |   TABLE ACCESS FULL | T2   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|*  3 |   1 |   4 |   HASH JOIN         |      | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K | 1930K |       |       |       |                      |                             |                     |
|   4 |   3 |   2 |    TABLE ACCESS FULL| T3   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|   5 |   3 |   3 |    TABLE ACCESS FULL| T1   | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
How is it possible to burn more than 20 seconds of CPU time with that execution plan?

The actual rows produced correspond pretty much to the estimated cardinalities (except for the final hash join), so that doesn't look suspect at first glance.
What becomes obvious from the SQL Monitoring output is that all the time is spent on the hash join operation ID = 1.

Of course at that point (at the latest) you should tell me off for not having you shown the predicate section of the plan and the corresponding query in first place.

So here is the predicate section and the corresponding query:
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END =CASE
              "T2"."ID2" WHEN 1 THEN "T2"."ID2" ELSE 1 END  AND CASE "T1"."ID2" WHEN
              2 THEN "T1"."ID2" ELSE 1 END =CASE "T2"."ID2" WHEN 2 THEN "T2"."ID2"
              ELSE 1 END )
       filter("T3"."ID2"=CASE  WHEN ("T1"."ID">"T2"."ID") THEN
              "T1"."ID" ELSE "T2"."ID" END )
   3 - access("T3"."ID"="T1"."ID")

   
select /*+
           leading(t1 t3 t2)
           full(t1)
           full(t3)
           use_hash(t3)
           swap_join_inputs(t3)
           full(t2)
           use_hash(t2)
           swap_join_inputs(t2)
       */
       t1.*
     --, t3.id2
     --, case when t1.id > t2.id then t1.id else t2.id end
from
       t1
     , t2
     , t3
where
       1 = 1
--
and    case when t1.id2 = 1 then t1.id2 else 1 end = case when t2.id2 = 1 then t2.id2 else 1 end
and    case when t1.id2 = 2 then t1.id2 else 1 end = case when t2.id2 = 2 then t2.id2 else 1 end
--
and    t3.id  = t1.id
and    t3.id2 = case when t1.id > t2.id then t1.id else t2.id end
;
There are two important aspects to this query and the plan: First, the join expression (without corresponding expression statistics) between T1 and T2 is sufficiently deceptive to hide from the optimizer that in fact this produces a cartesian product (mimicking real life multi table join expressions that lead to bad estimates) and second, the table T3 is joined to both T1 and an expression based on T1 and T2, which means that this expression can only be evaluated after the join to T1 and T2.
With the execution plan shape enforced via my hints (but could be a real life execution plan shape preferred by the optimizer) T3 and T1 are joined first, producing an innocent 10K rows row source, which is then joined to T2. And here the accident happens inside the hash join operation:

If you look closely at the predicate section you'll notice that the hash join operation has both, an ACCESS operation and a FILTER operation. The ACCESS operation performs based on the join between T1 and T2 a lookup into the hash table, which happens to be a cartesian product, so produces 10K times 10K rows, and only afterwards the FILTER (representing the T3 to T1/T2 join expression) is applied to these 100M rows, but matching only a single row in my example here, which is what the A-Rows shows for this operation.

So the point is that this excessive work and FILTER throwaway isn't very well represented in the row source statistics. Ideally you would need one of the following two modifications to get a better picture of what is going on:

Either the FILTER operator should be a separate step in the plan, which in theory would then look like this:
---------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows|
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   132 (100)|          |    1  |    1  |
|*  1a|  FILTER              |      | 10000 |  1191K|   132   (0)| 00:00:01 |    1  |    1  |
|*  1b|   HASH JOIN          |      | 10000 |  1191K|   132   (0)| 00:00:01 |    1  |  100M |
|   2 |    TABLE ACCESS FULL | T2   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |
|*  3 |    HASH JOIN         |      | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K |
|   4 |     TABLE ACCESS FULL| T3   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |
|   5 |     TABLE ACCESS FULL| T1   | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1a- filter("T3"."ID2"=CASE  WHEN ("T1"."ID">"T2"."ID") THEN
              "T1"."ID" ELSE "T2"."ID" END )
   1b- access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END =CASE
              "T2"."ID2" WHEN 1 THEN "T2"."ID2" ELSE 1 END  AND CASE "T1"."ID2" WHEN
              2 THEN "T1"."ID2" ELSE 1 END =CASE "T2"."ID2" WHEN 2 THEN "T2"."ID2"
              ELSE 1 END )
   3 - access("T3"."ID"="T1"."ID")
Which would make the excess rows produced by the ACCESS part of the hash join very obvious, but is probably for performance reasons not a good solution, because then the data would have to flow from one operation to another one rather than being processed within the HASH JOIN operator, which means increased overhead.

Or an additional rowsource statistics should be made available:
----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows|AE-Rows|
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |   132 (100)|          |    1  |    1  |    1  |
|*  1 |  HASH JOIN          |      | 10000 |  1191K|   132   (0)| 00:00:01 |    1  |    1  |  100M |
|   2 |   TABLE ACCESS FULL | T2   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |   10K |
|*  3 |   HASH JOIN         |      | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K |   10K |
|   4 |    TABLE ACCESS FULL| T3   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |   10K |
|   5 |    TABLE ACCESS FULL| T1   | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |   10K |
----------------------------------------------------------------------------------------------------
Which I called here "Actually evaluated rows" and in addition to this case here of combined ACCESS and FILTER operations could also be helpful for other FILTER cases, for example even for simple full table scan to see how many rows were evaluated, and not only how many rows matched a possible filter (what A-Rows currently shows).

In a recent OTN thread this topic came up again, and since I also came across this phenomenon a couple of times recently I thought to put this note together. Note that Martin Preiss has submitted a corresponding database idea on the OTN forum.

Expanding on this idea a bit further, it could be useful to have an additional "Estimated evaluated rows (EE-Rows)" calculated by the optimizer and shown in the plan. This could also be used to improve the optimizer's cost model for such cases, because at present it looks like the optimizer doesn't consider additional FILTER predicates on top of ACCESS predicates when calculating the CPU cost of operations like HASH JOINs.

Note that this problem isn't specific to HASH JOIN operations, you can get similar effects with other join methods, like NESTED LOOP joins, or even simple INDEX lookup operations, where again the ACCESS part isn't very selective but only the FILTER applied afterwards filters matching rows.

Here are some examples with the given setup:
select /*+
           leading(t1 t3 t2)
           full(t1)
           full(t3)
           use_hash(t3)
           swap_join_inputs(t3)
           index(t2)
           use_nl(t2)
       */
       t1.*
     --, t3.id2
     --, case when t1.id > t2.id then t1.id else t2.id end
from
       t1
     , t2
     , t3
where
       1 = 1
--
and    case when t1.id2 = 1 then t1.id2 else 1 end = case when t2.id2 = 1 then t2.id2 else 1 end
and    case when t1.id2 = 2 then t1.id2 else 1 end = case when t2.id2 = 2 then t2.id2 else 1 end
--
and    t3.id  = t1.id
and    t3.id2 = case when t1.id > t2.id then t1.id else t2.id end
;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Ord | Operation           | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows| PGA   | Start | Dur(T)| Dur(A)| Time Active Graph    | Activity Graph ASH          | Top 5 Activity ASH  |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     |   6 | SELECT STATEMENT    |        |       |       | 10090 (100)|          |    1  |    1  |       |       |       |       |                      |                             |                     |
|   1 |   0 |   5 |  NESTED LOOPS       |        | 10000 |  1416K| 10090   (1)| 00:00:01 |    1  |    1  |       |       |       |       |                      |                             |                     |
|*  2 |   1 |   3 |   HASH JOIN         |        | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K | 1890K |       |       |       |                      |                             |                     |
|   3 |   2 |   1 |    TABLE ACCESS FULL| T3     | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|   4 |   2 |   2 |    TABLE ACCESS FULL| T1     | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|*  5 |   1 |   4 |   INDEX RANGE SCAN  | T2_IDX |     1 |    30 |     1   (0)| 00:00:01 |   10K |    1  |       |     3 |    33 |    32 |  ################### | @@@@@@@@@@@@@@@@@@@ (100%)  | ON CPU(32)          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T3"."ID"="T1"."ID")
   5 - access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END
              ="T2"."SYS_NC00004$" AND CASE "T1"."ID2" WHEN 2 THEN "T1"."ID2" ELSE 1
              END ="T2"."SYS_NC00005$")
       filter("T3"."ID2"=CASE  WHEN ("T1"."ID">"T2"."ID") THEN
              "T1"."ID" ELSE "T2"."ID" END )

select /*+
           leading(t1 t3 t2)
           full(t1)
           full(t3)
           use_hash(t3)
           swap_join_inputs(t3)
           index(t2)
           use_nl(t2)
       */
       max(t1.filler)
     --, t3.id2
     --, case when t1.id > t2.id then t1.id else t2.id end
from
       t1
     , t2
     , t3
where
       1 = 1
--
and    case when t1.id2 = 1 then t1.id2 else 1 end = case when t2.id2 = 1 then t2.id2 else 1 end
and    case when t1.id2 = 2 then t1.id2 else 1 end = case when t2.id2 = 2 then t2.id2 else 1 end
--
and    t3.id  = t1.id
and    t2.filler >= t1.filler
and    t2.id = case when t1.id2 > t3.id2 then t1.id2 else t3.id2 end
;

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Ord | Operation            | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows| PGA   | Start | Dur(T)| Dur(A)| Time Active Graph    | Activity Graph ASH          | Top 5 Activity ASH  |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     |   7 | SELECT STATEMENT     |        |       |       | 20092 (100)|          |    1  |    1  |       |       |       |       |                      |                             |                     |
|   1 |   0 |   6 |  SORT AGGREGATE      |        |     1 |   223 |            |          |    1  |    1  |       |       |       |       |                      |                             |                     |
|   2 |   1 |   5 |   NESTED LOOPS       |        |     1 |   223 | 20092   (1)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|*  3 |   2 |   3 |    HASH JOIN         |        | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K | 1900K |       |       |       |                      |                             |                     |
|   4 |   3 |   1 |     TABLE ACCESS FULL| T3     | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|   5 |   3 |   2 |     TABLE ACCESS FULL| T1     | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|*  6 |   2 |   4 |    INDEX RANGE SCAN  | T2_IDX |     1 |   108 |     2   (0)| 00:00:01 |   10K |   10K |       |     2 |    34 |    34 | #################### | @@@@@@@@@@@@@@@@@@@ (100%)  | ON CPU(34)          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T3"."ID"="T1"."ID")
   6 - access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END
              ="T2"."SYS_NC00004$" AND CASE "T1"."ID2" WHEN 2 THEN "T1"."ID2" ELSE 1
              END ="T2"."SYS_NC00005$" AND "T2"."FILLER">="T1"."FILLER" AND
              "T2"."ID"=CASE  WHEN ("T1"."ID2">"T3"."ID2") THEN "T1"."ID2" ELSE
              "T3"."ID2" END  AND "T2"."FILLER" IS NOT NULL)
       filter("T2"."ID"=CASE  WHEN ("T1"."ID2">"T3"."ID2") THEN
              "T1"."ID2" ELSE "T3"."ID2" END )
The former one exhibits exactly the same problem as the HASH JOIN example, only that the FILTER is evaluated in the inner row source of a NESTED LOOP join after the index access operation.

The latter one shows as variation the classic partial "index access" due to a range comparison in between - although the entire expression can be evaluated on index level, the access part matches every index entry, so the range scan actually needs to walk the entire index at each loop iteration and the FILTER is then applied to all the index values evaluated.

Thursday, January 22, 2015

New Version Of XPLAN_ASH Utility - In-Memory Support

A new version 4.21 of the XPLAN_ASH utility is available for download. I publish this version because it will be used in the recent video tutorials explaining the Active Session History functionality of the script.

As usual the latest version can be downloaded here.

This is mainly a maintenance release that fixes some incompatibilities of the 4.2 version with less recent versions (10.2 and 11.2.0.1).

As an extra however, this version now differentiates between general CPU usage and in-memory CPU usage (similar to 12.1.0.2 Real-Time SQL Monitoring). This is not done in all possible sections of the output yet, but the most important ones are already covered.

So if you already use the 12.1.0.2 in-memory option this might be helpful to understand how much of your CPU time is spent on in-memory operations vs. non in-memory. Depending on your query profile you might be surprised by the results.

Here are the notes from the change log:

 - Forgot to address a minor issue where the SET_COUNT determined per DFO_TREE (either one or two slave sets) is incorrect in the special case of DFO trees having only S->P distributions (pre-12c style). Previous versions used a SET_COUNT of 2 in such a case which is incorrect, since there is only one slave set. 12c changes this behaviour with the new PX SELECTOR operator and requires again two sets.

- For RAC Cross Instance Parallel Execution specific output some formatting and readability was improved (more linebreaks etc.)

- Minor SQL issue fixed in "SQL statement execution ASH Summary" that prevented execution in 10.2 (ORA-32035)

- The NO_STATEMENT_QUEUING hint prevented the "OPTIMIZER_FEATURES_ENABLE" hint from being recognized, therefore some queries failed in 11.2.0.1 again with ORA-03113. Fixed

- "ON CPU" now distinguishes between "ON CPU INMEMORY" and "ON CPU" for in-memory scans

Sunday, January 11, 2015

Video Tutorial: XPLAN_ASH Active Session History - Introduction

I finally got around preparing another part of the XPLAN_ASH video tutorial.

This part is about the main funcationality of XPLAN_ASH: SQL statement execution analysis using Active Session History and Real-Time SQL Monitoring.

In this video tutorial I'll explain what the output of XPLAN_ASH is supposed to mean when using the Active Session History functionality of the script. Before diving into the details of the script output using sample reports I provide some overview and introduction in this part that hopefully makes it simpler to understand how the output is organized and what it is supposed to mean.

This is the initial, general introduction part. More parts to follow.

Monday, January 5, 2015

Unnecessary BUFFER SORT Operations - Parallel Concatenation Transformation

When using Parallel Execution, depending on the plan shape and the operations used, Oracle sometimes needs to turn non-blocking operations into blocking operations, which means in this case that the row source no longer passes its output data directly to the parent operation but buffers some data temporarily in PGA memory / TEMP. This is either accomplished via the special HASH JOIN BUFFERED operation, or simply by adding BUFFER SORT operations to the plan.

The reason for such a behaviour in parallel plans is the limitation of Oracle Parallel Execution that allows only a single data redistribution to be active concurrently. You can read more about that here.

However, sometimes the optimizer adds unnecessary BUFFER SORT operations to parallel execution plans, and one of the most obvious examples is when the so called "concatenation" query transformation is applied by the optimizer and Parallel Execution is involved.

UPDATE Please note: As mentioned below by Martin (thanks) what I call here "concatenation transformation" typically is called "OR expansion transformation" in CBO speak, and this term probably much better describes what the transformation is about. So whenever I wrote here "concatenation transformation" this can be substituted with "OR expansion transformation".

To understand the issue, first of all, what is the concatenation transformation about?

Whenever there are predicates combined with OR there is the possibility to rewrite the different conditions as separate queries unioned together.

In order to ensure that the result of the rewritten query doesn't contain any unwanted duplicates, the different branches of the UNIONed statement need to filter out any data fulfillinh conditions of previous branches - this is probably where originally the (at first sight) odd (and in the meanwhile documented) LNNVL function came into existence.

The predicates can be either single-table filters, where the concatenation might open up different access paths to the same table (like different indexes), or it might be predicates combining multiple tables, like joins or subqueries.

Here is a short example of the latter (the parallel hints are commented out but are used in the further examples to demonstrate the issue with Parallel Execution) - using version 12.1.0.2:

select 
       max(id) 
from 
(
  select /* parallel(t1 8) parallel(t2 8) */
         t2.* 
  from 
         t1
       , t2
  where 
         (t1.id = t2.id or t1.id = t2.id2)
);
In this example the join condition using an OR prevents any efficient join method between T1 and T2 when not re-writing the statement - Oracle can only resort to a NESTED LOOP join with a repeated full table scan of one of the tables, which is reflected in a rather high estimated cost:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    16 |  2177M  (2)| 23:37:34 |
|   1 |  SORT AGGREGATE     |      |     1 |    16 |            |          |
|   2 |   NESTED LOOPS      |      |  3999K|    61M|  2177M  (2)| 23:37:34 |
|   3 |    TABLE ACCESS FULL| T2   |  2000K|    19M|  1087   (1)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T1   |     2 |    12 |  1089   (2)| 00:00:01 |
----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter("T1"."ID"="T2"."ID" OR "T1"."ID"="T2"."ID2")
The same statement could be expressed by the following manual rewrite:

select max(id) from (
  select /* parallel(t1 8) parallel(t2 8) */
         t2.* 
  from 
         t1
       , t2
  where 
         t1.id = t2.id2
  ---------
  union all
  ---------
  select /* parallel(t1 8) parallel(t2 8) */
         t2.* 
  from 
         t1
       , t2
  where 
         t1.id = t2.id
  and    lnnvl(t1.id = t2.id2)
);
Notice the LNNVL function in the second branch of the UNION ALL that filters out any rows fulfilling the condition used in the first branch.

Also note that using UNION instead of UNION ALL plus LNNVL(s) to filter out any duplicate rows is also potentially incorrect as each query branch might produce duplicate rows that need to be retained as they are also part of the original query result.

At the expense of visiting the tables multiple times we now get at least efficient join methods in each branch (and hence a significantly lower cost estimate):

--------------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    13 |       | 11945   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE       |      |     1 |    13 |       |            |          |
|   2 |   VIEW                |      |  2100K|    26M|       | 11945   (1)| 00:00:01 |
|   3 |    UNION-ALL          |      |       |       |       |            |          |
|*  4 |     HASH JOIN         |      |  2000K|    30M|    34M|  5972   (1)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| T1   |  2000K|    11M|       |  1086   (1)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| T2   |  2000K|    19M|       |  1087   (1)| 00:00:01 |
|*  7 |     HASH JOIN         |      |   100K|  1562K|    34M|  5972   (1)| 00:00:01 |
|   8 |      TABLE ACCESS FULL| T1   |  2000K|    11M|       |  1086   (1)| 00:00:01 |
|   9 |      TABLE ACCESS FULL| T2   |  2000K|    19M|       |  1087   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("T1"."ID"="T2"."ID2")
   7 - access("T1"."ID"="T2"."ID")
       filter(LNNVL("T1"."ID"="T2"."ID2"))
And in fact, when not preventing the concatenation transformation (NO_EXPAND hint), the optimizer comes up with the following execution plan for the original statement:

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    16 |       | 11945   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE      |      |     1 |    16 |       |            |          |
|   2 |   CONCATENATION      |      |       |       |       |            |          |
|*  3 |    HASH JOIN         |      |  2000K|    30M|    34M|  5972   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |  2000K|    11M|       |  1086   (1)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| T2   |  2000K|    19M|       |  1087   (1)| 00:00:01 |
|*  6 |    HASH JOIN         |      |   100K|  1562K|    34M|  5972   (1)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| T1   |  2000K|    11M|       |  1086   (1)| 00:00:01 |
|   8 |     TABLE ACCESS FULL| T2   |  2000K|    19M|       |  1087   (1)| 00:00:01 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("T1"."ID"="T2"."ID2")
   6 - access("T1"."ID"="T2"."ID")
       filter(LNNVL("T1"."ID"="T2"."ID2"))
The only difference between those two plans for the manual and automatic rewrite is the CONCATENATION operator instead of UNION ALL, and that the subquery isn't merged in case of the UNION ALL (additional VIEW operator).

So far everything works as expected and you have seen the effect and rationale of the concatenation transformation.

If we run now the original statement using Parallel Execution (turn comments into hints), depending on the exact version used the resulting execution plans show various inefficiencies.

For reference, this is the parallel execution plan I get from 12.1.0.2 when using above manual rewrite:

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |     1 |    13 |   606   (2)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE                |          |     1 |    13 |            |          |        |      |            |
|   2 |   PX COORDINATOR               |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)         | :TQ10004 |     1 |    13 |            |          |  Q1,04 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE             |          |     1 |    13 |            |          |  Q1,04 | PCWP |            |
|   5 |      VIEW                      |          |  2100K|    26M|   606   (2)| 00:00:01 |  Q1,04 | PCWP |            |
|   6 |       UNION-ALL                |          |       |       |            |          |  Q1,04 | PCWP |            |
|*  7 |        HASH JOIN               |          |  2000K|    30M|   303   (2)| 00:00:01 |  Q1,04 | PCWP |            |
|   8 |         PX RECEIVE             |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,04 | PCWP |            |
|   9 |          PX SEND HYBRID HASH   | :TQ10000 |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | P->P | HYBRID HASH|
|  10 |           STATISTICS COLLECTOR |          |       |       |            |          |  Q1,00 | PCWC |            |
|  11 |            PX BLOCK ITERATOR   |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|  12 |             TABLE ACCESS FULL  | T1       |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|  13 |         PX RECEIVE             |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,04 | PCWP |            |
|  14 |          PX SEND HYBRID HASH   | :TQ10001 |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | P->P | HYBRID HASH|
|  15 |           PX BLOCK ITERATOR    |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | PCWC |            |
|  16 |            TABLE ACCESS FULL   | T2       |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | PCWP |            |
|* 17 |        HASH JOIN               |          |   100K|  1562K|   303   (2)| 00:00:01 |  Q1,04 | PCWP |            |
|  18 |         PX RECEIVE             |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,04 | PCWP |            |
|  19 |          PX SEND HYBRID HASH   | :TQ10002 |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,02 | P->P | HYBRID HASH|
|  20 |           STATISTICS COLLECTOR |          |       |       |            |          |  Q1,02 | PCWC |            |
|  21 |            PX BLOCK ITERATOR   |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,02 | PCWC |            |
|  22 |             TABLE ACCESS FULL  | T1       |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,02 | PCWP |            |
|  23 |         PX RECEIVE             |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,04 | PCWP |            |
|  24 |          PX SEND HYBRID HASH   | :TQ10003 |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,03 | P->P | HYBRID HASH|
|  25 |           PX BLOCK ITERATOR    |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,03 | PCWC |            |
|  26 |            TABLE ACCESS FULL   | T2       |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,03 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   7 - access("T1"."ID"="T2"."ID2")
  17 - access("T1"."ID"="T2"."ID")
       filter(LNNVL("T1"."ID"="T2"."ID2"))
This is a pretty straightforward parallel plan, with the only possibly noteable exception of the new 12c "HYBRID HASH" distribution feature being used.

Now let's have a look at the resulting execution plan when the concatenation transformation gets used:

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |     1 |    16 |   606   (2)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE                      |          |     1 |    16 |            |          |        |      |            |
|   2 |   PX COORDINATOR                     |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)               | :TQ20003 |     1 |    16 |            |          |  Q2,03 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                   |          |     1 |    16 |            |          |  Q2,03 | PCWP |            |
|   5 |      CONCATENATION                   |          |       |       |            |          |  Q2,03 | PCWP |            |
|*  6 |       HASH JOIN                      |          |  2000K|    30M|   303   (2)| 00:00:01 |  Q2,03 | PCWP |            |
|   7 |        PX RECEIVE                    |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q2,03 | PCWP |            |
|   8 |         PX SEND HYBRID HASH          | :TQ20001 |  2000K|    11M|   151   (1)| 00:00:01 |  Q2,01 | P->P | HYBRID HASH|
|   9 |          STATISTICS COLLECTOR        |          |       |       |            |          |  Q2,01 | PCWC |            |
|  10 |           BUFFER SORT                |          |     1 |    16 |            |          |  Q2,01 | PCWP |            |
|  11 |            PX BLOCK ITERATOR         |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q2,01 | PCWC |            |
|  12 |             TABLE ACCESS FULL        | T1       |  2000K|    11M|   151   (1)| 00:00:01 |  Q2,01 | PCWP |            |
|  13 |        PX RECEIVE                    |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q2,03 | PCWP |            |
|  14 |         PX SEND HYBRID HASH          | :TQ20002 |  2000K|    19M|   151   (1)| 00:00:01 |  Q2,02 | P->P | HYBRID HASH|
|  15 |          BUFFER SORT                 |          |     1 |    16 |            |          |  Q2,02 | PCWP |            |
|  16 |           PX BLOCK ITERATOR          |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q2,02 | PCWC |            |
|  17 |            TABLE ACCESS FULL         | T2       |  2000K|    19M|   151   (1)| 00:00:01 |  Q2,02 | PCWP |            |
|  18 |       BUFFER SORT                    |          |       |       |            |          |  Q2,03 | PCWC |            |
|  19 |        PX RECEIVE                    |          |   100K|  1562K|   303   (2)| 00:00:01 |  Q2,03 | PCWP |            |
|  20 |         PX SEND ROUND-ROBIN          | :TQ20000 |   100K|  1562K|   303   (2)| 00:00:01 |        | S->P | RND-ROBIN  |
|  21 |          BUFFER SORT                 |          |     1 |    16 |            |          |        |      |            |
|  22 |           PX COORDINATOR             |          |       |       |            |          |        |      |            |
|  23 |            PX SEND QC (RANDOM)       | :TQ10002 |   100K|  1562K|   303   (2)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |
|  24 |             BUFFER SORT              |          |     1 |    16 |            |          |  Q1,02 | PCWP |            |
|* 25 |              HASH JOIN BUFFERED      |          |   100K|  1562K|   303   (2)| 00:00:01 |  Q1,02 | PCWP |            |
|  26 |               PX RECEIVE             |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,02 | PCWP |            |
|  27 |                PX SEND HYBRID HASH   | :TQ10000 |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | P->P | HYBRID HASH|
|  28 |                 STATISTICS COLLECTOR |          |       |       |            |          |  Q1,00 | PCWC |            |
|  29 |                  BUFFER SORT         |          |     1 |    16 |            |          |  Q1,00 | PCWP |            |
|  30 |                   PX BLOCK ITERATOR  |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|  31 |                    TABLE ACCESS FULL | T1       |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|  32 |               PX RECEIVE             |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,02 | PCWP |            |
|  33 |                PX SEND HYBRID HASH   | :TQ10001 |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | P->P | HYBRID HASH|
|  34 |                 BUFFER SORT          |          |     1 |    16 |            |          |  Q1,01 | PCWP |            |
|  35 |                  PX BLOCK ITERATOR   |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | PCWC |            |
|  36 |                   TABLE ACCESS FULL  | T2       |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   6 - access("T1"."ID"="T2"."ID2")
  25 - access("T1"."ID"="T2"."ID")
       filter(LNNVL("T1"."ID"="T2"."ID2"))
This looks a bit weird, and when comparing it to the plan gotten from the manual rewrite, it shows the following unnecessary differences:

- There are various BUFFER SORT operations that don't make a lot of sense, for example each parallel table scan is followed by a BUFFER SORT operation, and even the HASH JOIN BUFFERED in the lower part of the plan is followed by a BUFFER SORT (double buffering?)

- The plan is decomposed into two so called DFO trees, which you can see for example from the two PX COORDINATOR operators (operation id 2 and 22), which adds another unnecessary serial execution part to the plan and can have additional side effects I explain in one of my video tutorials.

This means that such execution plan shapes possibly will have a much higher demand for PGA memory than necessary (the BUFFER SORT operation will attempt to keep the data produced by the child row source in PGA), and also might cause additional I/O to and from TEMP. Since PGA memory consumed by one session influences also the Auto PGA allocation of other sessions this means that such executions not only affect the particular SQL execution in question but also any other concurrent executions allocating PGA memory.

Depending on the amount of data to be buffered BUFFER SORT operations closer to the root of the execution plan are more likely to have significant impact performance-wise, as they might have to buffer large amounts of data.

One very obvious sign of inefficiency are double BUFFERing operations, like a HASH JOIN BUFFERED followed by a BUFFER SORT as parent operation, which you can spot in the sample plan shown above.

Another interesting point is that the parallel plans differ from point release to point release and show different levels of inefficiencies, for example, 10.2.0.5, 11.1.0.7 and 11.2.0.1 produce different plans than 11.2.0.2, which is again different from what 11.2.0.3 & 11.2.0.4 produce - and using OPTIMIZER_FEATURES_ENABLE in newer versions to emulate older versions doesn't always reproduce the exact plans produced by the actual, older versions. So all in all this looks like a pretty messy part of the optimizer.

Furthermore the problem doesn't always show up - it seems to depend largely on the exact version and the plan shape used. For example, replacing the SELECT MAX(ID) FROM () outermost query in above example with a simple SELECT ID FROM () results in a plan where the concatenation transformation doesn't produce all those strange BUFFER SORTS - although it still produces a plan decomposed into two DFO trees in some versions.

It also interesting to note that depending on version and plan shape sometimes the manual rewrite using UNION ALL is also affected by either unluckily placed or unnecessary BUFFER SORT operations, but not to the same extent as the plans resulting from the CONCATENATION transformation.

In the next post I'll show how this inefficiency can have some interesting side effects when being triggered by / combined with other transformations.

Footnote


Table structures used in the test cases:

create table t1
compress
as
select
        (rownum * 2) + 1 as id
      , mod(rownum, 2000) + 1 as id2
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(100000) */ * from dual
connect by
        level <= 100000) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't1')

create table t2
compress
as
select * from t1;

exec dbms_stats.gather_table_stats(null, 't2')

Sunday, December 21, 2014

New Version Of XPLAN_ASH Utility

A new version 4.2 of the XPLAN_ASH utility is available for download.

As usual the latest version can be downloaded here.

There were no too significant changes in this release, mainly some new sections related to I/O figures were added.

One thing to note is that some of the sections in recent releases may require a linesize larger than 700, so the script's settings have been changed to 800. If you use corresponding settings for CMD.EXE under Windows for example you might have to adjust accordingly to prevent ugly line wrapping.

Here are the notes from the change log:

- New sections "Concurrent activity I/O Summary based on ASH" and "Concurrent activity I/O Summary per Instance based on ASH" to see the I/O activity summary for concurrent activity

- Many averages and medians now also have accompanying minimum and maximum values shown. This isn't as good as having histograms but gives a better idea of the range of values, and how potentially outliers influence the average and deserve further investigations

- Bug fixed: When using MONITOR as source for searching for the most recent SQL_ID executed by a given SID due to some filtering on date no SQL_ID was found. This is now fixed

- Bug fixed: In RAC GV$ASH_INFO should be used to determine available samples

- The "Parallel Execution Skew ASH" indicator is now weighted - so far any activity level per plan line and sample below the actual DOP counted as one, and the same if the activity level was above
The sum of the "ones" was then set relative to the total number of samples the plan line was active to determine the "skewness" indicator

Now the actual difference between the activity level and the actual DOP is calculated and compared to the number of total samples active times the actual DOP
This should give a better picture of the actual impact the skew has on the overall execution

- Most queries now use a NO_STATEMENT_QUEUING hint for environments where AUTO DOP is enabled and the XPLAN_ASH queries could get queued otherwise

- The physical I/O bytes on execution plan line level taken from "Real-Time SQL Monitoring" has now the more appropriate heading "ReadB" and "WriteB", I never liked the former misleading "Reads"/"Writes" heading

Sunday, June 29, 2014

New Version Of XPLAN_ASH Utility

A new version 4.1 of the XPLAN_ASH utility is available for download.

As usual the latest version can be downloaded here.

This version in particular supports now the new 12c "Adaptive" plan feature - previous versions don't cope very well with those if you don't add the "ADAPTIVE" formatting option manually.

Here are the notes from the change log:

- GV$SQL_MONITOR and GV$SQL_PLAN_MONITOR can now be customized in the
settings as table names in case you want to use your own custom monitoring repository that copies data from GV$SQL_MONITOR and GV$SQL_PLAN_MONITOR in order to keep/persist monitoring data. The tables need to have at least those columns that are used by XPLAN_ASH from the original views

- The "Activity Timeline based on ASH" for RAC Cross Instance Parallel Execution shows an additional line break for the GLOBAL summary

- Fixed various bugs related to some figures when analyzing Cross Instance RAC Parallel Execution

- The new "GLOBAL" aggregation level for Cross Instance RAC Parallel Execution (see version 4.0 change log below) is now also shown in the "Information on Parallel Degree based on ASH" section

- The "Parallel Distribution ASH" column on execution plan line level now can show process information taken from Real-Time SQL Monitoring for those processes that are not found in ASH samples.
This effectively means that with monitoring information available for every plan line every involved process will now be shown along with its ASH sample count and rows produced

So some processes will show up now with a sample count of 0.

The idea behind this is to provide more information about row distribution even for those lines/processes that are not covered by the ASH samples.
Previously the rowcount produced was only shown for those processes covered in ASH samples

The new behaviour is default - if you find the output messy you can return to previous behaviour (show only rowcounts for processes found in ASH samples) by setting the new configuration switch "show_monitor_rowcount" to any other value than the default of "YES"

- The "Real-Time SQL Monitoring" information on execution plan line level now includes the read and write request information ("ReadReq", "WriteReq")

- The I/O figures based on ASH now include the new "DELTA_READ_MEM_BYTES" information that was added in 12c. This applies to the following sections:
  - SQL Statement I/O Summary based on ASH
  - Parallel Worker activity overview based on ASH
  - Activity Timeline based on ASH

The "Read Mem Bytes" seems to correspond to the "logical read bytes from cache" statistics, so any direct path reads are not covered by this value

- Added some more verbose description in the "Note" sections how to handle long lines. XPLAN_ASH now does a SET TRIMSPOOL ON if you want to spool the output to a file

- Whenever the output referred to DFOs this was changed to "DFO TREE", which is the correct term

- The "Parallel Worker activity overview based on ASH" section now shows a blank line between the sections which should make this section more readable

- Adaptive plans are now supported by XPLAN_ASH

Note they don't work well with previous versions, the formatting of the inactive lines breaks and the overall information can be misleading if you don't add manually the "ADAPTIVE" formatting option

If XPLAN_ASH detects an adaptive plan, it will always force the ADAPTIVE formatting option.
This also means that Adaptive plans for the time being won't work with SASH as SASH doesn't collect the OTHER_XML column from GV$SQL_PLAN
You could manually add that column to SASH_SQLPLANS and add the column to the "sash_pkg.get_sqlplans" procedure - this is a CLOB column, but INSERT / SELECT should work I think
The view SASH_PLAN_TABLE needs also to be modified to select the OTHER_XML column instead of a dummy NULL

Although this output is less readable than the "faked" output that shows only the plan operations that are actually in use, it is the only simple way how ASH/MONITOR data can be related to execution plan lines, as these hold the information with the actual plan line, not the one that is made up by DBMS_XPLAN.DISPLAY* based on the DISPLAY_MAP information in the OTHER_XML column

Hence I decided for the time being to use the same approach as 12c Real-Time SQL Monitoring and always show the full/adaptive shape of the plan

Another challenge for XPLAN_ASH with adaptive plans is the possibly changing PLAN_HASH_VALUE during execution.

XPLAN_ASH extracts the PLAN_HASH_VALUE from ASH/MONITOR when trying to get the plan from DBA_HIST_SQL_PLAN.

Hence XPLAN_ASH now needs to take care to extract the most recent PLAN_HASH_VALUE, previously it didn't matter as it wasn't supposed to change during execution. This seems to work based on my tests, but it's something to keep in mind

- The new "gather stats on load" 12c feature implies for INSERT...SELECT statements that the cursor will immediately be invalidated/removed from the Library Cache after (successful) execution. So now such
INSERT...SELECT behave like CTAS which also gets removed immediately. This is a pity as you won't be able to pick up the plan from the Library Cache after the execution completes using XPLAN_ASH (or any other tool using DBMS_XPLAN.DISPLAY*).

Although V$SQL_PLAN_MONITOR might keep plan for some time after the execution, it can't be used as input to DBMS_XPLAN.DISPLAY*, hence this isn't a viable workaround. In principle however this isn't a good thing as the SQL and plan information might be missing from AWR / STATSPACK reports due to the
immediate invalidation/removal.

At the time being the only viable workaround known to me for this is to prevent the "gather stats on load" feature either via parameter "_optimizer_gather_stats_on_load" or hint "no_gather_optimizer_statistics", or via using pre-12c optimizer feature settings which implicitly disables the feature which is of course not
really a good workaround as the feature itself might be rather desirable

Sunday, April 6, 2014

Analysing Parallel Execution Skew - Without Diagnostics / Tuning Pack License

This is the third part of the video tutorial "Analysing Parallel Execution Skew". In this part I show how to analyse a parallel SQL execution regarding Parallel Execution Skew.

If you don't have a Diagnostics / Tuning Pack license the options you have for doing that are quite limited, and the approach, as demonstrated in the tutorial, has several limitations and shortcomings.

Here is a link to the video on my Youtube channel.

If you want to reproduce or play around with the examples shown in the tutorial here is the script for creating the tables and running the queries / DML commands used in the tutorial. A shout goes out to Christo Kutrovsky at Pythian who I think was the one who inspired the beautified version on V$PQ_TQSTAT.

---------------------
-- Links for S-ASH --
---------------------
--
-- http://www.perfvision.com/ash.php
-- http://www.pythian.com/blog/trying-out-s-ash/
-- http://sourceforge.net/projects/orasash/files/v2.3/
-- http://sourceforge.net/projects/ashv/
---------------------

-- Table creation
set echo on timing on time on

drop table t_1;

purge table t_1;

drop table t_2;

purge table t_2;

drop table t_1_part;

purge table t_1_part;

drop table t_2_part;

purge table t_2_part;

drop table t1;

purge table t1;

drop table t2;

purge table t2;

drop table t3;

purge table t3;

drop table t4;

purge table t4;

drop table t5;

purge table t5;

drop table x;

purge table x;

create table t1
as
select  /*+ use_nl(a b) */
        (rownum * 2) as id
      , rownum as id2
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1000) */ * from dual
connect by
        level <= 1000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b
;

exec dbms_stats.gather_table_stats(null, 't1')

alter table t1 cache;

create table t2
compress
as
select
        (rownum * 2) + 1 as id
      , mod(rownum, 2000) + 1 as id2
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1000000) */ * from dual
connect by
        level <= 1000000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b
;

exec dbms_stats.gather_table_stats(null, 't2')

alter table t2 cache;

create table t3
as
select  /*+ use_nl(a b) */
        (rownum * 2) as id
      , rownum as id2
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1000) */ * from dual
connect by
        level <= 1000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b
;

exec dbms_stats.gather_table_stats(null, 't3')

alter table t3 cache;

create table t4
compress
as
select
        (rownum * 2) + 1 as id
      , mod(rownum, 2000) + 1 as id2
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1000000) */ * from dual
connect by
        level <= 1000000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b
;

exec dbms_stats.gather_table_stats(null, 't4')

alter table t4 cache;

create table t5
as
select  /*+ use_nl(a b) */
        (rownum * 2) as id
      , rownum as id2
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1000) */ * from dual
connect by
        level <= 1000) a, (select /*+ cardinality(2) */ * from dual connect by level <= 2) b
;

exec dbms_stats.gather_table_stats(null, 't5')

alter table t5 cache;

create table x
compress
as
select * from t2
where 1 = 2;

create unique index x_idx1 on x (id);

alter table t1 parallel 2;

alter table t2 parallel 2;

alter table t3 parallel 15;

alter table t4 parallel 15;

alter table t5 parallel 15;

create table t_1
compress
as
select  /*+ use_nl(a b) */
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_1')

create table t_2
compress
as
select
        rownum as id
      , case when rownum <= 5e5 then mod(rownum, 2e6) + 1 else 1 end as fk_id_skew
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1', no_invalidate=>false)

alter table t_1 parallel 8 cache;

alter table t_2 parallel 8 cache;

create table t_1_part
partition by hash(id) partitions 8
compress
as
select  /*+ use_nl(a b) */
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_1_part')

create table t_2_part
partition by hash(fk_id_skew) partitions 8
compress
as
select
        rownum as id
      , case when rownum <= 5e5 then mod(rownum, 2e6) + 1 else 1 end as fk_id_skew
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_2_part', method_opt=>'for all columns size 1', no_invalidate=>false)

alter table t_1_part parallel 8 cache;

alter table t_2_part parallel 8 cache;

---------------------------------------------------------------
-- Single DFO tree (with Parallel Execution Skew), many DFOs --
---------------------------------------------------------------

set echo on timing on time on verify on

define num_cpu = "14"

alter session set workarea_size_policy = manual;

alter session set sort_area_size = 200000000;

alter session set sort_area_size = 200000000;

alter session set hash_area_size = 200000000;

alter session set hash_area_size = 200000000;

select
        max(t1_id)
      , max(t1_filler)
      , max(t2_id)
      , max(t2_filler)
      , max(t3_id)
      , max(t3_filler)
from    (
          select  /*+ monitor
                     no_merge
                     no_merge(v_1)
                     no_merge(v_5)
                     parallel(t1 &num_cpu)
                     PQ_DISTRIBUTE(T1 HASH HASH)
                     PQ_DISTRIBUTE(V_5 HASH HASH)
                     leading (v_1 v_5 t1)
                     use_hash(v_1 v_5 t1)
                     swap_join_inputs(t1)
                  */
                  t1.id     as t1_id
                , regexp_replace(v_5.t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t1_filler
                , v_5.*
          from    (
                    select  /*+ parallel(t2 &num_cpu)
                                parallel(t3 &num_cpu)
                                leading(t3 t2)
                                use_hash(t3 t2)
                                swap_join_inputs(t2)
                                PQ_DISTRIBUTE(T2 HASH HASH)
                            */
                            t2.id     as t2_id
                          , t2.filler as t2_filler
                          , t2.id2    as t2_id2
                          , t3.id     as t3_id
                          , t3.filler as t3_filler
                    from
                            t1 t2
                          , t2 t3
                    where
                            t3.id2 = t2.id2 (+)
                    and     regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
                    and     mod(t3.id2, 3) = 0
                  ) v_1
                , (
                    select  /*+ parallel(t2 &num_cpu)
                                parallel(t3 &num_cpu)
                                leading(t3 t2)
                                use_hash(t3 t2)
                                swap_join_inputs(t2)
                                PQ_DISTRIBUTE(T2 HASH HASH)
                            */
                            t2.id     as t2_id
                          , t2.filler as t2_filler
                          , t2.id2    as t2_id2
                          , t3.id     as t3_id
                          , t3.filler as t3_filler
                    from
                            t1 t2
                          , t2 t3
                    where
                            t3.id = t2.id (+)
                    and     regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
                    and     mod(t3.id2, 3) = 0
                  ) v_5
                , t1
          where
                  v_1.t3_id = v_5.t3_id
          and     v_5.t2_id2 = t1.id2 (+) + 2001
          and     regexp_replace(v_5.t3_filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t1.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
        )
;

break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup

-- compute sum label Total of num_rows on server_type

select
        /*dfo_number
      , */tq_id
      , cast(server_type as varchar2(10)) as server_type
      , instance
      , cast(process as varchar2(8)) as process
      , num_rows
      , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%"
      , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph
      , round(bytes / 1024 / 1024) as mb
      , round(bytes / nullif(num_rows, 0)) as "bytes/row"
from
        v$pq_tqstat
order by
        dfo_number
      , tq_id
      , server_type desc
      , instance
      , process
;

---------------------------------------------------------------------------------------------------
-- Same statement with Parallel TEMP TABLE TRANSFORMATION, V$PQ_TQSTAT shows useless information --
---------------------------------------------------------------------------------------------------

set echo on timing on time on verify on

define num_cpu = "14"

alter session set workarea_size_policy = manual;

alter session set sort_area_size = 200000000;

alter session set sort_area_size = 200000000;

alter session set hash_area_size = 200000000;

alter session set hash_area_size = 200000000;

with result as
(
select /*+ materialize
           monitor
           no_merge
           no_merge(v_1)
           no_merge(v_5)
           parallel(t1 &num_cpu)
           PQ_DISTRIBUTE(T1 HASH HASH)
           PQ_DISTRIBUTE(V_1 HASH HASH)
           PQ_DISTRIBUTE(V_5 HASH HASH)
           leading (v_1 v_5 t1)
           use_hash(v_1 v_5 t1)
           swap_join_inputs(t1)
       */
       t1.id     as t1_id
     , regexp_replace(v_5.t3_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as t1_filler
     , v_5.*
from   (
  select /*+ parallel(t2 &num_cpu) parallel(t3 &num_cpu) leading(t3 t2) use_hash(t3 t2) swap_join_inputs(t2) PQ_DISTRIBUTE(T2 HASH HASH) */
        t2.id     as t2_id
      , t2.filler as t2_filler
      , t2.id2    as t2_id2
      , t3.id     as t3_id
      , t3.filler as t3_filler
  from
        t1 t2
      , t2 t3
  where
        t3.id2 = t2.id2 (+)
  and   regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
  and   mod(t3.id2, 3) = 0
)
 v_1
     , (
  select /*+ parallel(t2 &num_cpu) parallel(t3 &num_cpu) leading(t3 t2) use_hash(t3 t2) swap_join_inputs(t2) PQ_DISTRIBUTE(T2 HASH HASH) */
        t2.id     as t2_id
      , t2.filler as t2_filler
      , t2.id2    as t2_id2
      , t3.id     as t3_id
      , t3.filler as t3_filler
  from
        t1 t2
      , t2 t3
  where
        t3.id = t2.id (+)
  and   regexp_replace(t3.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
  and   mod(t3.id2, 3) = 0
) v_5
     , t1
where
       v_1.t3_id = v_5.t3_id
and    v_5.t2_id2 = t1.id2 (+) + 2001
and    regexp_replace(v_5.t3_filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t1.filler (+), '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
)
select max(t1_id), max(t1_filler), max(t2_id), max(t2_filler), max(t3_id), max(t3_filler) from
result;

break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup

-- compute sum label Total of num_rows on server_type

select
        /*dfo_number
      , */tq_id
      , cast(server_type as varchar2(10)) as server_type
      , instance
      , cast(process as varchar2(8)) as process
      , num_rows
      , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%"
      , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph
      , round(bytes / 1024 / 1024) as mb
      , round(bytes / nullif(num_rows, 0)) as "bytes/row"
from
        v$pq_tqstat
order by
        dfo_number
      , tq_id
      , server_type desc
      , instance
      , process
;

--------------------------------------------------------------------------------------------------
-- This construct results in misleading information from V$PQ_TQSTAT (actually a complete mess) --
--------------------------------------------------------------------------------------------------

set echo on timing on time on

alter session enable parallel dml;

truncate table x;

insert  /*+ append parallel(x 4) */ into x
select  /*+ leading(v1 v2) optimizer_features_enable('11.2.0.1') */
        v_1.id
      , v_1.id2
      , v_1.filler
from    (
           select
                   id
                 , id2
                 , filler
           from    (
                      select  /*+ parallel(t2 4) no_merge */
                              rownum as id
                            , t2.id2
                            , t2.filler
                      from
                              t2
                      where
                              mod(t2.id2, 3) = 0
                      and     regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
                   ) v1
        ) v_1
      , (
          select
                   id
                 , id2
                 , filler
          from     (
                     select  /*+ parallel(t2 8) no_merge */
                             rownum as id
                           , t2.id2
                           , t2.filler
                     from
                             t2
                     where
                             mod(t2.id2, 3) = 0
                     and     regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') = regexp_replace(t2.filler, '^([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
                   ) v2
        ) v_2
where
        v_1.id = v_2.id
and     v_1.filler = v_2.filler
;

-- Parallel DML requires a COMMIT before querying V$PQ_TQSTAT
commit;

break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup

compute sum label Total of num_rows on server_type

select
        dfo_number
      , tq_id
      , cast(server_type as varchar2(10)) as server_type
      , instance
      , cast(process as varchar2(8)) as process
      , num_rows
      , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%"
      , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph
      , round(bytes / 1024 / 1024) as mb
      , round(bytes / nullif(num_rows, 0)) as "bytes/row"
from
        v$pq_tqstat
order by
        dfo_number
      , tq_id
      , server_type desc
      , instance
      , process
;

----------------------------------------------------------------------
-- Single DFO tree (with Parallel Execution Skew, almost no impact) --
----------------------------------------------------------------------

set echo on timing on time on

alter session set workarea_size_policy = manual;

alter session set sort_area_size = 500000000;

alter session set sort_area_size = 500000000;

alter session set hash_area_size = 500000000;

alter session set hash_area_size = 500000000;

select  /*+ leading(v1)
            use_hash(t_1)
            no_swap_join_inputs(t_1)
            pq_distribute(t_1 hash hash)
        */
        max(t_1.filler)
      , max(v1.t_1_filler)
      , max(v1.t_2_filler)
from
        t_1
      , (
          select  /*+ no_merge
                      leading(t_1 t_2)
                      use_hash(t_2)
                      no_swap_join_inputs(t_2)
                      pq_distribute(t_2 hash hash) */
                  t_1.id as t_1_id
                , t_1.filler as t_1_filler
                , t_2.id as t_2_id
                , t_2.filler as t_2_filler
          from    t_1
                , t_2
          where
                  t_2.fk_id_skew = t_1.id
        ) v1
where
        v1.t_2_id = t_1.id
and     regexp_replace(v1.t_2_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
and     regexp_replace(v1.t_2_filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
;

break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup

-- compute sum label Total of num_rows on server_type

select
        /*dfo_number
      , */tq_id
      , cast(server_type as varchar2(10)) as server_type
      , instance
      , cast(process as varchar2(8)) as process
      , num_rows
      , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%"
      , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph
      , round(bytes / 1024 / 1024) as mb
      , round(bytes / nullif(num_rows, 0)) as "bytes/row"
from
        v$pq_tqstat
order by
        dfo_number
      , tq_id
      , server_type desc
      , instance
      , process
;

--------------------------------------------------------------------------------------------------------------------------------
-- Full Partition Wise Join with partition skew - V$PQ_TQSTAT is of no help, since no redistribution takes place (single DFO) --
--------------------------------------------------------------------------------------------------------------------------------

set echo on timing on time on

alter session set workarea_size_policy = manual;

alter session set sort_area_size = 500000000;

alter session set sort_area_size = 500000000;

alter session set hash_area_size = 500000000;

alter session set hash_area_size = 500000000;

select count(t_2_filler) from (
select  /*+ monitor
            leading(t_1 t_2)
            use_hash(t_2)
            no_swap_join_inputs(t_2)
            pq_distribute(t_2 none none)
        */
        t_1.id as t_1_id
      , t_1.filler as t_1_filler
      , t_2.id as t_2_id
      , t_2.filler as t_2_filler
from    t_1_part t_1
      , t_2_part t_2
where
        t_2.fk_id_skew = t_1.id
and     regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup

-- compute sum label Total of num_rows on server_type

select
        /*dfo_number
      , */tq_id
      , cast(server_type as varchar2(10)) as server_type
      , instance
      , cast(process as varchar2(8)) as process
      , num_rows
      , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%"
      , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph
      , round(bytes / 1024 / 1024) as mb
      , round(bytes / nullif(num_rows, 0)) as "bytes/row"
from
        v$pq_tqstat
order by
        dfo_number
      , tq_id
      , server_type desc
      , instance
      , process
;