Showing posts with label 10gR2. Show all posts
Showing posts with label 10gR2. Show all posts

Sunday, July 10, 2016

Comparing Columns Containing NULL Values

Prompted by a (not really that) recent discussion on the OTN forum I've decided to publish this note.

Sometimes you have the task of comparing column values and handling the NULL value cases correctly makes this rather cumbersome for columns that are allowed to be NULL.

The "official" SQL way of comparing two column values and to find out whether they are equal or not - under the assumption that having NULL in both columns should be treated as equal (a point that can be argued) would read in SQL something like the following for the "unequal" case:
column1 != column2 or (column1 is null and column2 is not null) or (column1 is not null and column2 is null)
and
column1 = column2 or (column1 is null and column2 is null)
for the "equal" case.

Imagine this to be done for a lot of columns and it becomes obvious that people might be looking for more concise alternatives how to express this.

Sometimes you might have come across code that uses NVL or a similar function with some "special" value that is not supposed to be part of the column data to be compared - like -1 for (positive) integer values for example, but this has the inherent risk of producing wrong results if the "special" value is (or might become in future) not that special after all.

The not really documented SYS_OP_MAP_NONNULL function could come handy if it was official, because it covers the NULL cases automatically.

Note: I still would call it undocumented even if it shows up in some "Materialized View" related part of the official 12c documentation, because it is not part of the official SQL language manual
SYS_OP_MAP_NONNULL(column1) != SYS_OP_MAP_NONNULL(column2)
is all that is needed. More concise as above, but undocumented.

And: There is one obvious drawback to SYS_OP_MAP_NONNULL, since it adds one byte to the input value to ensure the mapped NULL value differs from any other value: It doesn't work properly with maxed out columns, for example this:
SQL> create table t
  2  as
  3  select rpad('x', 4000) as col1, rpad('x', 4000) as col2 from dual
  4  union all
  5  select rpad('x', 4000) as col1, rpad('y', 4000) as col2 from dual
  6  union all
  7  select null as col1, rpad('y', 4000) as col2 from dual
  8  union all
  9  select rpad('x', 4000) as col1, null from dual
 10  union all
 11  select null as col1, null as col2 from dual
 12  ;

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't')

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from t where sys_op_map_nonnull(col1) = sys_op_map_nonnull(col2);
select count(*) from t where sys_op_map_nonnull(col1) = sys_op_map_nonnull(col2)
                     *
ERROR at line 1:
ORA-01706: user function result value was too large
errors out with "ORA-01706: user function result value was too large".

DECODE


Decode is an exception to the rule how Oracle treats NULLs in SQL - it returns the "equal" value if both operands to compare are NULL, so it allows the following construct to achieve above comparison (see above OTN forum discussion for credits where this expression appeared first, more recently here on Stew Ashton's blog):

For checking for inequality:
decode(column1, column2, 0, 1) = 1
For equality:
decode(column1, column2, 0, 1) = 0
Very concise, and officially documented.

Interestingly, starting with 11.2.0.2 there is an implicit re-write by default enabled that rewrites above expression (and only this variation, not the one testing for inequality):
decode(column1, column2, 0, 1) = 0
into
sys_op_map_nonnull(column1) = sys_op_map_nonnull(column2)
This is controlled via fix control 8551880 ("Rewrite decode predicate to join").

Guess what happens when running the following query in 11.2.0.2 or later against above data set (the fix control is enabled by default in these versions):
SQL> select /*+ opt_param('_fix_control', '8551880:0') */ count(*) from t where decode(col1, col2, 0, 1) = 0;

  COUNT(*)
----------
         2

SQL>
SQL> select /*+ opt_param('_fix_control', '8551880:1') */ count(*) from t where decode(col1, col2, 0, 1) = 0;
select /*+ opt_param('_fix_control', '8551880:1') */ count(*) from t where decode(col1, col2, 0, 1) = 0
                                                                   *
ERROR at line 1:
ORA-01706: user function result value was too large
Finally, what about performance? For that purpose I created the following compressed 50M rows table, covering different cases:
create table t1 compress
as
select case mod(rownum, 5) + 1
       when 1 then cast('C1' as varchar2(20))
       when 2 then cast(null as varchar2(20))
       when 3 then cast('C3' as varchar2(20))
       when 4 then cast(null as varchar2(20))
       when 5 then cast('C5' as varchar2(20))
       end as col1,
       case mod(rownum, 5) + 1
       when 1 then cast('C2' as varchar2(20))
       when 2 then cast(null as varchar2(20))
       when 3 then cast('C3' as varchar2(20))
       when 4 then cast('C4' as varchar2(20))
       when 5 then cast(null as varchar2(20))
       end as col2
from
     (select /*+ cardinality(1e4) */ null from dual connect by level <= 1e4)
   , (select /*+ cardinality(5e3) */ null from dual connect by level <= 5e3)
;
and got the following results from 11.2.0.4 (ignoring the different cardinality estimates for the moment, since I'm only interested in the elapsed time here):
SQL> select count(*) from t1;

Elapsed: 00:00:00.94

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 | 18743   (2)| 00:03:45 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T1   |    50M| 18743   (2)| 00:03:45 |
-------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      68456  consistent gets
          0  physical reads
          0  redo size
        346  bytes sent via SQL*Net to client
        364  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> select count(*) from t1 where decode(col1, col2, 0, 1) = 1;

Elapsed: 00:00:02.55

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 | 19176   (4)| 00:03:51 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   500K|  1953K| 19176   (4)| 00:03:51 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(DECODE("COL1","COL2",0,1)=1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      68456  consistent gets
          0  physical reads
          0  redo size
        346  bytes sent via SQL*Net to client
        364  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> select count(*) from t1 where SYS_OP_MAP_NONNULL("COL1")!=SYS_OP_MAP_NONNULL("COL2");

Elapsed: 00:00:03.04

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 | 19515   (6)| 00:03:55 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |    15M|    60M| 19515   (6)| 00:03:55 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYS_OP_MAP_NONNULL("COL1")<>SYS_OP_MAP_NONNULL("COL2"))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      68456  consistent gets
          0  physical reads
          0  redo size
        346  bytes sent via SQL*Net to client
        364  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> select count(*) from t1 where (col1 != col2 or (col1 is null and col2 is not null) or (col1 is not null and col2 is null));

Elapsed: 00:00:03.51

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 | 18876   (2)| 00:03:47 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |    30M|   115M| 18876   (2)| 00:03:47 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1" IS NULL AND "COL2" IS NOT NULL OR "COL2" IS NULL
              AND "COL1" IS NOT NULL OR "COL1"<>"COL2")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      68456  consistent gets
          0  physical reads
          0  redo size
        346  bytes sent via SQL*Net to client
        364  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> select /*+ opt_param('_fix_control', '8551880:0') */ count(*) from t1 where decode(col1, col2, 0, 1) = 0;

Elapsed: 00:00:02.49

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 | 19176   (4)| 00:03:51 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   500K|  1953K| 19176   (4)| 00:03:51 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(DECODE("COL1","COL2",0,1)=0)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      68456  consistent gets
          0  physical reads
          0  redo size
        346  bytes sent via SQL*Net to client
        364  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> select /*+ opt_param('_fix_control', '8551880:1') */ count(*) from t1 where decode(col1, col2, 0, 1) = 0;

Elapsed: 00:00:03.04

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 | 19515   (6)| 00:03:55 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  6000K|    22M| 19515   (6)| 00:03:55 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYS_OP_MAP_NONNULL("COL1")=SYS_OP_MAP_NONNULL("COL2"))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      68456  consistent gets
          0  physical reads
          0  redo size
        346  bytes sent via SQL*Net to client
        364  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> select count(*) from t1 where col1 = col2 or (col1 is null and col2 is null);

Elapsed: 00:00:02.30

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 | 18865   (2)| 00:03:47 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |    13M|    49M| 18865   (2)| 00:03:47 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1" IS NULL AND "COL2" IS NULL OR "COL1"="COL2")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      68456  consistent gets
          0  physical reads
          0  redo size
        346  bytes sent via SQL*Net to client
        364  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
Surprisingly, in my tests SYS_OP_MAP_NONNULL always performed worst, and the DECODE expression pretty close to the more verbose SQL expression - depending on the data set sometimes even faster.

So using SYS_OP_MAP_NONNULL, and in particular the implicit rewrite might not be such a good idea after all. And now you know how the automatic rewrite could be disabled if desired.

Wednesday, June 29, 2016

New Version Of XPLAN_ASH Utility

A new version 4.23 of the XPLAN_ASH utility is available for download.

As usual the latest version can be downloaded here.

This version comes only with minor changes, see the change log below.

Here are the notes from the change log:

- Finally corrected the very old and wrong description of "wait times" in the script comments, where it was talking about "in-flight" wait events but that is not correct. ASH performs a "fix-up" of the last 255 samples or so and updates them with the time waited, so these wait events are not "in-flight"

- Removed some of the clean up code added in 4.22 to the beginning of the script, because it doesn't really help much but spooled script output always contained these error messages about non-existent column definitions being cleared

- The "Concurrent I/O" sections will now also be skipped in LIMITED_ASH mode

- Some more fixes to the I/O figures in the "Activity Timeline based on ASH" - the spreading introduced in 4.22 needed some further refinement (see 4.22 change log for more details)

Sunday, January 17, 2016

DML Operations On Partitioned Tables Can Restart On Invalidation

It's probably not that well known that Oracle can actually rollback / re-start the execution of a DML statement should the cursor become invalidated. By rollback / re-start I mean that Oracle actually performs a statement level rollback (so any modification already performed by that statement until that point gets rolled back), performs another optimization phase of the statement on re-start (due to the invalidation) and begins the execution of the statement from scratch. Note that this can happen multiple times - actually it's possible to end up in a kind of infinite loop when this happens, leading to statements that can run for very, very long (I've seen statements on Production environments executing for several days although a single execution would only take minutes).

The pre-requisites to meet for this to happen are not that complex or exotic:

- The target table to manipulate needs to be partitioned

- The cursor currently executing gets invalidated - either by running DDL (typically think of partition related operations) - or simply by gathering statistics on one of the objects involved in the statement

- The DML statement hasn't touched yet one of the partitions of the target table but attempts to do so after the cursor got invalidated

When the last condition is met, the statement performs a rollback, and since it got invalidated - which is one of the conditions to be met - another optimization phase happens, meaning that it's also possible to get different execution plans for the different execution attempts. When the execution plan is ready the execution begins from scratch.

According to my tests the issue described here applies to both conventional and direct-path inserts, merge statements (insert / update / delete) as well as serial and parallel execution. I haven't explicitly tested UPDATE and DELETE statements, but the assumption is that they are affected, too.

The behaviour is documented in the following note on MOS: "Insert Statement On Partitioned Tables Is RE-Started After Invalidation (Doc ID 1462003.1)" which links to Bug "14102209 : INSERT STATEMENT' IS RESTARTING BY ITSELF AFTER INVALIDATION" where you can also find some more comments on this behaviour. The issue seems to be that Oracle at that point is no longer sure if the partition information compiled into the cursor for the partitioned target table is still correct or not (and internally raises and catches a corresponding error, like "ORA-14403: Cursor invalidation detected after getting DML partition lock", leading to the re-try), so it needs to refresh that information, hence the re-optimization and re-start of the cursor.

Note that this also means that the DML statement might already have performed modifications to other partitions but after being invalidated attempts to modify another partition it hasn't touched yet - it just needs an attempt to modify a partition not touched into yet by that statement.

It's also kind of nasty that the statement keeps running the potentially lengthy query part after being invalidated only to find out it needs to re-start after the first row is attempted to be applied to a target table partition not touched yet.

Note that applications typically run into this problem, when they behave like the following:

- There are longer running DML statements that take typically several seconds / minutes until they attempt to actually perform an modification to a partitioned target table

- They either use DBMS_STATS to gather stats on one of the involved tables, typically using NO_INVALIDATE=>FALSE, which leads to an immediate invalidation of all affected cursors

- And/Or they perform partition related operations on one of the tables involved, like truncating, creating or exchanging partitions. Note that it is important to point out that it doesn't matter which objects gets DDL / stats applied, so it's not limited to activity on the partitioned target table being modified - any object involved in the query can cause the cursor invalidation

In principle this is another variation of the general theme "Don't mix concurrent DDL with DML/queries on the same objects". Doing so is something that leads to all kinds of side effects, and the way the Oracle engine is designed means that it doesn't cope very well with doing so.

Here is a simple test case for reproducing the issue, using INSERTs in this case here (either via INSERT or MERGE statement):
create table t_target (
id number(*, 0) not null,
pkey number(*, 0) not null,
filler varchar2(500)
)
--segment creation immediate
partition by range (pkey) --interval (1)
(
  partition pkey_0 values less than (1)
, partition pkey_1 values less than (2)
, partition pkey_2 values less than (3)
, partition pkey_3 values less than (4)
);

create table t_source
compress
as
select 1 as id, rpad('x', 100) as filler
from
(select /*+ cardinality(1e3) */ null from dual connect by level <= 1e3),
(select /*+ cardinality(1e0) */ null from dual connect by level <= 1e0)
union all
select 1 as id, rpad('y', 100) as filler from dual;

-- Run this again once the DML statement below got started
exec dbms_stats.gather_table_stats(null, 't_source', no_invalidate=>false)

exec dbms_stats.gather_table_stats(null, 't_target', no_invalidate=>false)

----------------------------------------------------------------------------------------------------------------------------------
-- INSERT example                                                                                                               --
-- Run above DBMS_STATS calls or any other command that invalidates the cursor during execution to force re-start of the cursor --
----------------------------------------------------------------------------------------------------------------------------------

set echo on timing on time on

-- alter session set tracefile_identifier = 'insert_restart';

-- alter session set events '10046 trace name context forever, level 12';

-- exec sys.dbms_monitor.session_trace_enable(waits => true, binds => true/*, plan_stat => 'all_executions'*/)

insert /* append */ into t_target (id, pkey, filler)
select * from (
select /*+
           use_hash(a b)
           no_eliminate_oby
       */
       a.id, 1 as pkey, a.filler
from t_source a, t_source b
where a.id = b.id
and (
regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm')
or (b.filler = rpad('y', 100) and a.filler = rpad('y', 100))
)
order by a.id
)
union all
select * from (
select /*+
           use_hash(a b)
           no_eliminate_oby
       */
       a.id, 2 as pkey, a.filler
from t_source a, t_source b
where a.id = b.id
and (
regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm')
or (b.filler = rpad('y', 100) and a.filler = rpad('y', 100))
)
order by a.id
)
union all
select * from (
select /*+
           use_hash(a b)
           no_eliminate_oby
       */
       a.id, 3 as pkey, a.filler
from t_source a, t_source b
where a.id = b.id
and (
regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm')
or (b.filler = rpad('y', 100) and a.filler = rpad('y', 100))
)
order by a.id
)
;

-- exec sys.dbms_monitor.session_trace_disable

----------------------------------------------------------------------------------------------------------------------------------
-- MERGE example                                                                                                                --
-- Run above DBMS_STATS calls or any other command that invalidates the cursor during execution to force re-start of the cursor --
----------------------------------------------------------------------------------------------------------------------------------

set echo on timing on time on

merge /* append */ into t_target t
using (
select * from (
select /*+
           use_hash(a b)
           no_eliminate_oby
       */
       a.id, 1 as pkey, a.filler
from t_source a, t_source b
where a.id = b.id
and (
regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm')
or (b.filler = rpad('y', 100) and a.filler = rpad('y', 100))
)
order by a.id
)
union all
select * from (
select /*+
           use_hash(a b)
           no_eliminate_oby
       */
       a.id, 2 as pkey, a.filler
from t_source a, t_source b
where a.id = b.id
and (
regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm')
or (b.filler = rpad('y', 100) and a.filler = rpad('y', 100))
)
order by a.id
)
union all
select * from (
select /*+
           use_hash(a b)
           no_eliminate_oby
       */
       a.id, 3 as pkey, a.filler
from t_source a, t_source b
where a.id = b.id
and (
regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i')
--or regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') != regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm')
or (b.filler = rpad('y', 100) and a.filler = rpad('y', 100))
)
order by a.id
)
) s
on (s.id = t.id)
when not matched then
insert (id, pkey, filler) values (s.id, s.pkey, s.filler)
;
The idea of the test case is to maximise the time until each UNION ALL branch produces data to insert by performing an inefficient HASH JOIN (that in fact generates a Cartesian product and needs to apply a costly REGEXP filter on that huge intermediate result) and forcing a sort on the join result, so rows will only be handed over to the parent operations until all rows were processed in the join operation - and each branch generates data for a different partition of the target table. Typically it should take several seconds per branch to execute (if you need more time just un-comment the additional REGEXP_REPLACE filters), so you should have plenty of time to cause the invalidation from another session.

This means during the execution of each branch invalidating the cursor (for example by executing either of the two DBMS_STATS calls on the source or target table using NO_INVALIDATE=>FALSE) will lead to a re-start of the statement at the next attempt to write into a new target partition, possibly rolling back rows already inserted into other partitions.

Diagnostics


If you run the provided INSERT or MERGE statement on newer versions of Oracle that include the SQL_EXEC_START and SQL_EXEC_ID in V$ACTIVE_SESSION_HISTORY (or V$SESSION for that matter) and invalidate the cursor during execution and before a partition of the target table gets inserted for the first time then you can see that these entries change as the statement restarts.

In such cases the INVALIDATIONS and LOADS increase in V$SQL accordingly and the OBJECT_STATUS changes from INVALID_UNAUTH to VALID again with each re-start attempt. In newer versions where you can configure the "plan_stat" information for SQL trace to "all_executions" you'll find STAT lines for each execution attempt dumped to the trace file, but only a single final EXEC line, where the elapsed time covers all execution attempts.

The oldest version I've tested was 10.2.0.4, and that one showed already the re-start behaviour, although I would be inclined to think that this wasn't the case with older versions. So if anybody still runs older versions than 10.2.0.4 I would be interested to hear whether the behaviour reproduces or not.

Monday, December 28, 2015

New Version Of XPLAN_ASH Utility

A new version 4.22 of the XPLAN_ASH utility is available for download.

As usual the latest version can be downloaded here.

This version primarily addresses an issue with 12c - if the HIST mode got used to pull ASH information from AWR in 12c it turned out that Oracle forgot to add the new "DELTA_READ_MEM_BYTES" columns to DBA_HIST_ACTIVE_SESS_HISTORY - although it got officially added to V$ACTIVE_SESSION_HISTORY in 12c. So now I had to implement several additional if/then/else constructs to the script to handle this inconsistency. It's the first time that the HIST view doesn't seem to reflect all columns from the V$ view - very likely an oversight rather than by design I assume.

Apart from that the I/O figures (Read Bytes / Write Bytes etc.) in the "Activity Timeline" make more sense for those cases where a process hasn't been sampled for several sample points (see below for more details).

Also in case an execution plan could not be found it is now made more obvious with a corresponding message that you might be able to pull the execution plan from AWR by using different ASH modes (MIXED / HIST).

Here are the notes from the change log:

- Fixed a funny bug that in 12c they have forgotton to add the DELTA_READ_MEM_BYTES to DBA_HIST_ACTIVE_SESS_HISTORY, so in HIST mode with 12c prior XPLAN_ASH versions could error out with invalid column name

- Change the way the I/O figures are treated in the "Activity Timeline based on ASH". Now the I/O per second is spread over the (previous) samples covered by DELTA_TIME. This should give a smoother representation of the I/O performed and much closer to what you see in Real-Time SQL Monitoring reports. The difference to prior versions is only visible in cases where a session wasn't sampled for quite a while and hence has a DELTA_TIME spanning multiple previous sample points. This also means that the I/O related columns in the "Activity Timeline based on ASH" now show only the PER SECOND values, no longer to the totals like prior versions

- Added a SET NULL "" in the configuration and initialization section for SQL*Plus environments that use a non-default SET NULL setting. This screwed up some internal switches so that XPLAN_ASH for example thought it's running in a S-ASH repository

- Added a note to the end of the output if no execution plan could be found and falling back to retrieving plan operation details from ASH. Also added the note to use MIXED or HIST ASH source option if no execution plan could be found in CURR mode, so execution plan has been purged from Shared Pool in the meanwhile

- Cloned the "cleanup" section from the end to the beginning of the script to ensure no current SQL*Plus environment settings influence the script execution. This is particularly relevant if the script execution gets cancelled before the final cleanup section is reached or some other, previous scripts left a mess behind

Sunday, August 16, 2015

Parallel Projection

A recent case at a client reminded me of something that isn't really new but not so well known - Oracle by default performs evaluation at the latest possible point in the execution plan.

So if you happen to have expressions in the projection of a simple SQL statement that runs parallel it might be counter-intuitive that by default Oracle won't evaluate the projection in the Parallel Slaves but in the Query Coordinator - even if it was technically possible - because the latest possible point is the SELECT operation with the ID = 0 of the plan, which is always performed by the Query Coordinator.

Of course, if you make use of expressions that can't be evaluated in parallel or aren't implemented for parallel evaluation, then there is no other choice than doing this in the Query Coordinator.

The specific case in question was a generic export functionality that allowed exporting report results to some CSV or Excel like format, and some of these reports had a lot of rows and complex - in that case CPU intensive - expressions in their projection clause.

When looking at the run time profile of such an export query it became obvious that although it was a (very simple) parallel plan, all of the time was spent in the Query Coordinator, effectively turning this at runtime into a serial execution.

This effect can be reproduced very easily:
create table t_1
compress
as
select  /*+ use_nl(a b) */
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't_1', method_opt=>'for all columns size 1')

alter table t_1 parallel cache;

-- Run some CPU intensive expressions in the projection 
-- of a simple parallel Full Table Scan
set echo on timing on time on

set autotrace traceonly statistics

set arraysize 500

select
        regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as some_cpu_intensive_exp1
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as some_cpu_intensive_exp2
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') as some_cpu_intensive_exp3
from t_1
;

-- The plan is clearly parallel
--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |  2000K|   192M|   221   (1)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| T_1      |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

-- But the runtime profile looks more serial
-- although the Parallel Slaves get used to run the Full Table Scan
-- All time spent in the operation ID = 0
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Operation            | Name     | Execs | A-Rows| ReadB | ReadReq | Start | Dur(T)| Dur(A)| Time Active Graph    | Parallel Distribution ASH                                                     | Parallel Execution Skew ASH      | Activity Graph ASH          | Top 5 Activity ASH  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     | SELECT STATEMENT     |          |    5  | 2000K |       |         |     3 |   136 |   120 | #################### |   1:sqlplus.exe(120)[2000K],P008(0)[0],P009(0)[0],P00A(0)[0],P00B(0)[0]       | ################################ | @@@@@@@@@@@@@@@@@@@ ( 98%)  | ON CPU(120)         |
|   1 |   0 |  PX COORDINATOR      |          |    5  | 2000K |       |         |   119 |     1 |     1 |                 #    |   1:sqlplus.exe(1)[2000K],P008(0)[0],P009(0)[0],P00A(0)[0],P00B(0)[0]         |                                  |                     ( .8%)  | ON CPU(1)           |
|   2 |   1 |   PX SEND QC (RANDOM)| :TQ10000 |    4  | 2000K |       |         |    66 |    11 |     2 |          ##          |   2:P00B(1)[508K],P00A(1)[490K],P008(0)[505K],P009(0)[497K],sqlplus.exe(0)[0] |                                  |                     (1.6%)  | PX qref latch(2)    |
|   3 |   2 |    PX BLOCK ITERATOR |          |    4  | 2000K |       |         |       |       |       |                      |   0:P00B(0)[508K],P008(0)[505K],P009(0)[497K],P00A(0)[490K],sqlplus.exe(0)[0] |                                  |                             |                     |
|*  4 |   3 |     TABLE ACCESS FULL| T_1      |   52  | 2000K |   23M |     74  |       |       |       |                      |   0:P00B(0)[508K],P008(0)[505K],P009(0)[497K],P00A(0)[490K],sqlplus.exe(0)[0] |                                  |                             |                     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fortunately there is a simple and straightforward way to make use of the Parallel Slaves for evaluation of projection expressions that can be evaluated in parallel - simply add a suitable NO_MERGE hint for the query block that you want the projection to be evaluated for in the Parallel Slaves.

If you don't want to have side effects on the overall plan shape by not merging views you could always wrap the original query in an outer SELECT and not merging the now inner query block. There seems to be a rule that the projection of a view always get evaluated at the VIEW operator, and if we check the execution plan we can see that the VIEW operator is marked parallel:
set echo on timing on time on

set autotrace traceonly statistics

set arraysize 500

select /*+ no_merge(x) */ * from (
select
        regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') as some_cpu_intensive_exp1
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as some_cpu_intensive_exp2
      , regexp_replace(filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'm') as some_cpu_intensive_exp3
from t_1
) x
;

-- View operator is marked parallel
-- This is were the projection clause of the VIEW will be evaluated
---------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |  2000K|    11G|   221   (1)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 |  2000K|    11G|   221   (1)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    VIEW               |          |  2000K|    11G|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|   4 |     PX BLOCK ITERATOR |          |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL| T_1      |  2000K|   192M|   221   (1)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------

-- Runtime profile now shows effective usage of Parallel Slaves 
-- for doing the CPU intensive work
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Operation             | Name     | Execs | A-Rows| Start | Dur(T)| Dur(A)| Time Active Graph    | Parallel Distribution ASH                                                         | Parallel Execution Skew ASH| Activity Graph ASH          | Top 5 Activity ASH               |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     | SELECT STATEMENT      |          |    5  | 2000K |       |       |       |                      |   0:sqlplus.exe(0)[2000K],P000(0)[0],P001(0)[0],P002(0)[0],P003(0)[0]             |                            |                             |                                  |
|   1 |   0 |  PX COORDINATOR       |          |    5  | 2000K |    17 |    63 |    10 |    # ##   #    ####  |   1:sqlplus.exe(10)[2000K],P000(0)[0],P001(0)[0],P002(0)[0],P003(0)[0]            | ####                       | *                   (5.6%)  | resmgr:cpu quantum(10)           |
|   2 |   1 |   PX SEND QC (RANDOM) | :TQ10000 |    4  | 2000K |     5 |    61 |    10 | ## #  ## ## ## #     |   3:P002(5)[544K],P001(4)[487K],P000(1)[535K],P003(0)[434K],sqlplus.exe(0)[0]     | #                          |                     (5.6%)  | ON CPU(7),resmgr:cpu quantum(3)  |
|   3 |   2 |    VIEW               |          |    4  | 2000K |     2 |    82 |    69 | #################### |   4:P003(42)[434K],P001(35)[487K],P000(26)[535K],P002(22)[544K],sqlplus.exe(0)[0] | ############               | @@@@@@@@@@@@@@@@@@@ ( 70%)  | ON CPU(125)                      |
|   4 |   3 |     PX BLOCK ITERATOR |          |    4  | 2000K |       |       |       |                      |   0:P002(0)[544K],P000(0)[535K],P001(0)[487K],P003(0)[434K],sqlplus.exe(0)[0]     |                            |                             |                                  |
|*  5 |   4 |      TABLE ACCESS FULL| T_1      |   52  | 2000K |     3 |    78 |    29 | ###### ####### # ### |   4:P000(11)[535K],P002(8)[544K],P001(8)[487K],P003(7)[434K],sqlplus.exe(0)[0]    | ###                        | *****               ( 19%)  | resmgr:cpu quantum(30),ON CPU(4) |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
At runtime the duration of the query now gets reduced significantly and we can see the Parallel Slaves getting used when the VIEW operator gets evaluated. Although the overall CPU time used is similar to the previous example, the duration of the query execution is less since this CPU time is now spent in parallel in the slaves instead in the Query Coordinator.

Summary


By default Oracle performs evaluation at the latest possible point of the execution plan. Sometimes you can improve runtime by actively influencing when the projection will be evaluated by preventing view merging and introducing a VIEW operator that will be used to evaluate the projection clause.

The optimizer so far doesn't seem to incorporate such possibilities in its evaluations of possible plan shapes, so this is something you need to do manually up to and including Oracle 12c (version 12.1.0.2 as of time of writing this).

Friday, May 29, 2015

Temp Table Transformation Cardinality Estimates - 2

Continuing from the previous part - which was about the Temp Table Transformation and join cardinality estimates - using the same simple table setup here is a slight variation of the previously used query to demonstrate the potential impact on single table cardinality estimates:
explain plan for
with
cte as (
select /* inline */ id from t1 t
where 1 = 1
)
select /*+
           no_merge(a) no_merge(b)
       */ * from cte a, cte b
where a.id = b.id
and a.id > 990 and b.id > 990
;

-- 11.2.0.x Plan with TEMP transformation
--------------------------------------------------------------------------------
| Id  | Operation                  | Name                      | Rows  | Bytes |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                           |  1000 | 26000 |
|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D661C_275FD9 |       |       |
|   3 |    TABLE ACCESS FULL       | T1                        |  1000 |  4000 |
|*  4 |   HASH JOIN                |                           |  1000 | 26000 |
|*  5 |    VIEW                    |                           |  1000 | 13000 |
|   6 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D661C_275FD9 |  1000 |  4000 |
|*  7 |    VIEW                    |                           |  1000 | 13000 |
|   8 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D661C_275FD9 |  1000 |  4000 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("A"."ID"="B"."ID")
   5 - filter("A"."ID">990)
   7 - filter("B"."ID">990)

-- 11.2.0.x Plan with INLINE hint
----------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |
----------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |   260 |
|*  1 |  HASH JOIN          |      |    10 |   260 |
|   2 |   VIEW              |      |    10 |   130 |
|*  3 |    TABLE ACCESS FULL| T1   |    10 |    40 |
|   4 |   VIEW              |      |    10 |   130 |
|*  5 |    TABLE ACCESS FULL| T1   |    10 |    40 |
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("A"."ID"="B"."ID")
   3 - filter("ID">990)
   5 - filter("ID">990)
Again it's obvious that the Temp Table Transformation can have significant impact on the single table cardinality estimates.

In particular:

- Although the same filter is applied in both cases to the rowsources A and B, in case of the Temp Table Transformation it doesn't reduce the cardinality. So it's not uncommon to end up with significant cardinality overestimates in case the transformation gets used

- For Exadata environments particularly bad is that the filter isn't pushed into the TABLE ACCESS FULL operator, but only applied in the VIEW operator above, which means that it can't be offloaded - all the data needs to be sent from the Storage Cells to the Compute Nodes and filtered there. Not a very efficient way to operate on Exadata

The behaviour is still the same in 12c.

Monday, May 25, 2015

Temp Table Transformation Cardinality Estimates - 1

Having published recently two notes about the Temp Table Transformation highlighting the heuristics based decision and other weaknesses, for example regarding the projection of columns, it's time to publish some more notes about it.

The transformation can also have significant impact on cardinality estimates, both join and single table cardinality.

Looking at the difference in the join cardinality estimates of following simple example:
create table t1
as
select
        rownum as id
      , mod(rownum, 10) + 1 as id2
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1000
;

exec dbms_stats.gather_table_stats(null, 't1')

alter session set tracefile_identifier = 'temp_trans_join_card';

alter session set events '10053 trace name context forever, level 1';

explain plan for
with
cte as (
select /* inline */ id + 1 as id from t1 t
where 1 = 1
)
select /*+
           --opt_estimate(@"SEL$2" join("A"@"SEL$2" "B"@"SEL$2") rows=1000)
           no_merge(a) no_merge(b)
       */ * from cte a, cte b
where a.id = b.id
;

alter session set events '10053 trace name context off';

-- 11.2.0.x Plan with TEMP transformation
--------------------------------------------------------------------------------
| Id  | Operation                  | Name                      | Rows  | Bytes |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                           |     1 |    26 |
|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D660C_27269C |       |       |
|   3 |    TABLE ACCESS FULL       | T1                        |  1000 |  4000 |
|*  4 |   HASH JOIN                |                           |     1 |    26 |
|   5 |    VIEW                    |                           |  1000 | 13000 |
|   6 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D660C_27269C |  1000 |  4000 |
|   7 |    VIEW                    |                           |  1000 | 13000 |
|   8 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D660C_27269C |  1000 |  4000 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("A"."ID"="B"."ID")

-- 11.2.0.x Plan with INLINE hint
----------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |
----------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   253K|
|*  1 |  HASH JOIN          |      | 10000 |   253K|
|   2 |   VIEW              |      |  1000 | 13000 |
|   3 |    TABLE ACCESS FULL| T1   |  1000 |  4000 |
|   4 |   VIEW              |      |  1000 | 13000 |
|   5 |    TABLE ACCESS FULL| T1   |  1000 |  4000 |
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("A"."ID"="B"."ID")
the following becomes obvious:

- There are vastly different cardinality estimates possible depending on whether the transformation gets used or not. In principle due to the NO_MERGE hints used, the transformation shouldn't have any impact on the estimates, but it does

- Looking at the optimizer trace file some information seems to get lost when the transformation gets used, in particular column related statistics

- This information loss, like in the example here, can lead to join cardinality estimates of 0 (rounded to 1 in the plan output)

- And even worse, at least in pre-12c versions, such a 0 cardinality estimate can't be corrected via OPT_ESTIMATE hints, since somehow the correction gets ignored/lost although being mentioned in the optimizer trace:
11.2.0.1:
  Column (#1): ID(
    AvgLen: 22 NDV: 1 Nulls: 0 Density: 0.000000
  Column (#1): ID(
    AvgLen: 22 NDV: 1 Nulls: 0 Density: 0.000000
Join Card:  0.000000 = = outer (1000.000000) * inner (1000.000000) * sel (0.000000)
>> Join Card adjusted from 0.000000  to: 1000.000000, prelen=2
Adjusted Join Cards: adjRatio=1.00 cardHjSmj=0.000000 cardHjSmjNPF=0.000000 cardNlj=0.000000 cardNSQ=1000.000000 cardNSQ_na=0.000000
Join Card - Rounded: 1 Computed: 0.00
The behaviour regarding the OPT_ESTIMATE hint changes in 12c, but then there are other oddities introduced in 12c that are not there in pre-12c - have a look at the "Query Block" section when using the INLINE variant of the query - there are two identical fully qualified object names, clearly a bug, making hinting using global hint syntax impossible for that query block.

Although my simple example here can be corrected via extended statistics on the join column expression used in the CTE query my point here is that depending on whether the transformation gets used or not vastly different and extreme cardinality estimates are possible - and those extreme cases even can't be corrected in pre-12c versions.

For example I recently had a real life case where two columns were joined that had a significant number of NULL values, one coming from a temp table transformation row source. Without the transformation the join cardinality estimates were reasonable, but the transformation again lead to such a 0 cardinality estimate (that couldn't be corrected via a (correctly specified) OPT_ESTIMATE hint), ruining the whole plan.

Thursday, May 7, 2015

Heuristic Temp Table Transformation - 2

Some time ago I've demonstrated the non-cost based decision for applying the temp table transformation when using CTEs (Common Table/Subquery Expressions). In this note I want to highlight another aspect of this behaviour.

Consider the following data creating a table with delibrately wide columns:
create table a
as
  select
          rownum as id
        , rownum as id2
        , rpad('x', 4000) as large_vc1
        , rpad('x', 4000) as large_vc2
        , rpad('x', 4000) as large_vc3
from
          dual
connect by
          level <= 1000
;

exec dbms_stats.gather_table_stats(null, 'a')
and this query and plans with and without the temp table transformation:
with cte
as
(
  select  /* inline */
          id
        , id2
        , large_vc1
        , large_vc2
        , large_vc3
from
          a
where
          1 = 1
)
select
        *
from
        (
          select id, count(*) from cte group by id
        ) a,
        (
          select id2, count(*) from cte group by id2
        ) b
where
        a.id = b.id2
;

-- Plan with TEMP TABLE transformation
--------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                           |  1000 | 52000 |  1341   (1)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6609_26FA32 |       |       |            |          |
|   3 |    TABLE ACCESS FULL       | A                         |  1000 |    11M|   452   (0)| 00:00:01 |
|*  4 |   HASH JOIN                |                           |  1000 | 52000 |   889   (1)| 00:00:01 |
|   5 |    VIEW                    |                           |  1000 | 26000 |   444   (1)| 00:00:01 |
|   6 |     HASH GROUP BY          |                           |  1000 |  4000 |   444   (1)| 00:00:01 |
|   7 |      VIEW                  |                           |  1000 |  4000 |   443   (0)| 00:00:01 |
|   8 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D6609_26FA32 |  1000 |    11M|   443   (0)| 00:00:01 |
|   9 |    VIEW                    |                           |  1000 | 26000 |   444   (1)| 00:00:01 |
|  10 |     HASH GROUP BY          |                           |  1000 |  4000 |   444   (1)| 00:00:01 |
|  11 |      VIEW                  |                           |  1000 |  4000 |   443   (0)| 00:00:01 |
|  12 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D6609_26FA32 |  1000 |    11M|   443   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

-- Plan with CTE inlined (turn INLINE into hint)
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  1000 | 52000 |   907   (1)| 00:00:01 |
|*  1 |  HASH JOIN           |      |  1000 | 52000 |   907   (1)| 00:00:01 |
|   2 |   VIEW               |      |  1000 | 26000 |   453   (1)| 00:00:01 |
|   3 |    HASH GROUP BY     |      |  1000 |  4000 |   453   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| A    |  1000 |  4000 |   452   (0)| 00:00:01 |
|   5 |   VIEW               |      |  1000 | 26000 |   453   (1)| 00:00:01 |
|   6 |    HASH GROUP BY     |      |  1000 |  4000 |   453   (1)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| A    |  1000 |  4000 |   452   (0)| 00:00:01 |
-----------------------------------------------------------------------------
Looking at the query and plan output the following becomes obvious:

- The mere existence of a WHERE clause, even if it is just "WHERE 1 = 1" and referencing the CTE more than once triggers the transformation (nothing new, already demonstrated in the mentioned previous note, as well as the fact that the inlined CTE variant is cheaper in cost)

- There is a huge difference between the estimated size of the TEMP TABLE and the size of the row sources when using the CTE inline

The latter is particular noteworthy: Usually Oracle is pretty clever in optimizing the projection and uses only those columns required (doesn't apply to the target expression of MERGE statements, by the way), which is reflected in the plan output for the inline CTEs - the wide columns don't matter here because they aren't referenced, although being mentioned in the CTE. But in case of the temp table transformation obviously all columns / expressions mentioned in the CTE become materialized, although not necessarily being referenced when the CTE gets used.

So it would be nice if Oracle only materialized those columns / expressions actually used.

Now you might raise the question why mention columns and expressions in the CTE that don't get used afterwards: Well, generic approaches sometimes lead to such constructs - imagine the CTE part was static, including all possible attributes, but the actual usage of the CTE can be customized by a client. In such cases where only a small part of the available attributes get actually used a temp table transformation can lead to a huge overhead in size of the generated temp table. Preventing the transformation addresses this issue, but then the inlined CTE will have to be evaluated as many times as referenced - which might not be desirable either.

Monday, April 20, 2015

Function-Based Indexes And CURSOR_SHARING = FORCE

In general it is known that Function-Based Indexes (FBIs) can no longer be used by the optimizer if the expression contains literals and CURSOR_SHARING = FORCE / SIMILAR (deprecated) turns those literals into bind variables. Jonathan Lewis described the issue quite a while ago here in detail.

In a recent OTN thread this issue was raised again, but to my surprise when I played around with a test case that mimicked the OP's problem query I found that (certain) Oracle versions have some built-in logic that enable FBI usage for certain cases where you would expect them to be not usable.

If you test the following code on versions from 10.2.0.4 (possibly earlier) up to and including version 11.2.0.3 then you'll notice some interesting details:

create table t
as
select * from all_objects;

create index t_idx on t (owner || ' ' || object_name);

exec dbms_stats.gather_table_stats(null, 't')

set echo on linesize 200 pagesize 0

alter session set cursor_sharing = force;

select /*+ full(t) */ * from t where owner || ' ' || object_name = 'BLA';

select * from table(dbms_xplan.display_cursor);

select /*+ index(t) */ * from t where owner || ' ' || object_name = 'BLA';

select * from table(dbms_xplan.display_cursor);

select /*+ index(t) */ * from t where owner || 'A' || object_name = 'BLA';

select * from table(dbms_xplan.display_cursor);
Here is the relevant output I got from 11.2.0.1 for example:
SQL> alter session set cursor_sharing = force;

Session altered.

SQL>
SQL> select /*+ full(t) */ * from t where owner || ' ' || object_name = 'BLA';

no rows selected

SQL>
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID  ar3tw7r1rvawk, child number 0
-------------------------------------
select /*+ full(t) */ * from t where owner || :"SYS_B_0" || object_name
= :"SYS_B_1"

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   284 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |     1 |   117 |   284   (2)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"||' '||"OBJECT_NAME"=:SYS_B_1)


19 rows selected.

SQL>
SQL> select /*+ index(t) */ * from t where owner || ' ' || object_name = 'BLA';

no rows selected

SQL>
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID  6kzz3vw5x8x3b, child number 0
-------------------------------------
select /*+ index(t) */ * from t where owner || :"SYS_B_0" ||
object_name = :"SYS_B_1"

Plan hash value: 470836197

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |   117 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T"."SYS_NC00016$"=:SYS_B_1)


20 rows selected.

SQL>
SQL> select /*+ index(t) */ * from t where owner || 'A' || object_name = 'BLA';

no rows selected

SQL>
SQL> select * from table(dbms_xplan.display_cursor);
SQL_ID  6kzz3vw5x8x3b, child number 1
-------------------------------------
select /*+ index(t) */ * from t where owner || :"SYS_B_0" ||
object_name = :"SYS_B_1"

Plan hash value: 3778778741

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |       |       | 52472 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T     |   724 | 84708 | 52472   (1)| 00:10:30 |
|   2 |   INDEX FULL SCAN           | T_IDX | 72351 |       |   420   (1)| 00:00:06 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"||:SYS_B_0||"OBJECT_NAME"=:SYS_B_1)


20 rows selected.
Looking at the statement text that results from "CURSOR_SHARING = force" we can spot the expected bind variables instead of the literals, and this should result in a corresponding predicate that doesn't match the FBI expression. However, when looking at the filter expression in the predicate section (when forcing a full table scan) we can spot something interesting: It still shows the literal, which doesn't correspond to the predicate of the rewritten query text.

The next execution shows that the FBI really can be used despite the bind variable replacement taking place, and the final execution shows that the cursor sharing works correctly in that sense that a new child cursor got created for the same SQL text with a different plan and different predicate section when using a different literal in the original SQL text. V$SQL_SHARED_CURSOR shows "HASH_MATCH_FAILED" which is described as "No existing child cursors have the unsafe literal bind hash values required by the current cursor", which makes sense and probably means that the corresponding bind variable is marked as "unsafe" internally.

This optimisation shows only up if there is a suitable FBI - if there's no corresponding expression the SQL text and predicate section match. Furthermore it only supports certain expressions - Jonathan's example shows that in general it's true that these rewrites prevent FBI usage. And obviously it ceases to work in 11.2.0.4 and 12c. Whether this is a bug or a feature I don't know, but since it only seems to apply to certain expressions it's probably not that relevant anyway.

As Jonathan points out in his note you can always work around the general problem by hiding the expression in a view, and since 11g of course a proper virtual column definition is the better approach, which doesn't expose this problem either.

Even better would be the proper usage of bind variables and not using forced cursor sharing, but there are still many installations out there that rely on that feature.

Monday, April 13, 2015

Combined ACCESS And FILTER Predicates - Excessive Throw-Away

Catchy title... Let's assume the following data setup:
create table t1
as
select
        rownum as id
      , 1 as id2
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1e4
;

create table t2
as
select
        rownum as id
      , 1 as id2
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1e4
;

create table t3
as
select
        rownum as id
      , 1 as id2
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1e4
;

exec dbms_stats.gather_table_stats(null, 't1')

exec dbms_stats.gather_table_stats(null, 't2')

exec dbms_stats.gather_table_stats(null, 't3')

-- Deliberately wrong order (FBI after gather stats) - the virtual columns created for this FBI don't have statistics, see below
create index t2_idx on t2 (case when id2 = 1 then id2 else 1 end, case when id2 = 2 then id2 else 1 end, filler, id);

create index t3_idx on t3 (id, filler, id2);
And the following execution plan (all results are from 12.1.0.2 but should be applicable to other versions, too):
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |  1416K|   132   (0)| 00:00:01 |
|*  1 |  HASH JOIN          |      | 10000 |  1416K|   132   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T2   | 10000 |   292K|    44   (0)| 00:00:01 |
|*  3 |   HASH JOIN         |      | 10000 |  1123K|    88   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T3   | 10000 | 70000 |    44   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T1   | 10000 |  1054K|    44   (0)| 00:00:01 |
----------------------------------------------------------------------------
How long would you expect it to run to return all rows (no tricks like expensive regular expressions or user-defined PL/SQL functions)?

Probably should take just a blink, given the tiny tables with just 10000 rows each.

However, these are the runtime statistics for a corresponding execution:
|            |            |            |
|            |DATABASE    |CPU         |
|DURATION    |TIME        |TIME        |
|------------|------------|------------|
|+0 00:00:23 |+0 00:00:23 |+0 00:00:23 |
|            |            |            |

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Ord | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows| PGA   | Start | Dur(T)| Dur(A)| Time Active Graph    | Activity Graph ASH          | Top 5 Activity ASH  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     |   6 | SELECT STATEMENT    |      |       |       |   132 (100)|          |    1  |    1  |    0  |       |       |       |                      |                             |                     |
|*  1 |   0 |   5 |  HASH JOIN          |      | 10000 |  1191K|   132   (0)| 00:00:01 |    1  |    1  | 1401K |     2 |    23 |    22 | ##### ############## | @@@@@@@@@@@@@@@@@@@ (100%)  | ON CPU(22)          |
|   2 |   1 |   1 |   TABLE ACCESS FULL | T2   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|*  3 |   1 |   4 |   HASH JOIN         |      | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K | 1930K |       |       |       |                      |                             |                     |
|   4 |   3 |   2 |    TABLE ACCESS FULL| T3   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|   5 |   3 |   3 |    TABLE ACCESS FULL| T1   | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
How is it possible to burn more than 20 seconds of CPU time with that execution plan?

The actual rows produced correspond pretty much to the estimated cardinalities (except for the final hash join), so that doesn't look suspect at first glance.
What becomes obvious from the SQL Monitoring output is that all the time is spent on the hash join operation ID = 1.

Of course at that point (at the latest) you should tell me off for not having you shown the predicate section of the plan and the corresponding query in first place.

So here is the predicate section and the corresponding query:
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END =CASE
              "T2"."ID2" WHEN 1 THEN "T2"."ID2" ELSE 1 END  AND CASE "T1"."ID2" WHEN
              2 THEN "T1"."ID2" ELSE 1 END =CASE "T2"."ID2" WHEN 2 THEN "T2"."ID2"
              ELSE 1 END )
       filter("T3"."ID2"=CASE  WHEN ("T1"."ID">"T2"."ID") THEN
              "T1"."ID" ELSE "T2"."ID" END )
   3 - access("T3"."ID"="T1"."ID")

   
select /*+
           leading(t1 t3 t2)
           full(t1)
           full(t3)
           use_hash(t3)
           swap_join_inputs(t3)
           full(t2)
           use_hash(t2)
           swap_join_inputs(t2)
       */
       t1.*
     --, t3.id2
     --, case when t1.id > t2.id then t1.id else t2.id end
from
       t1
     , t2
     , t3
where
       1 = 1
--
and    case when t1.id2 = 1 then t1.id2 else 1 end = case when t2.id2 = 1 then t2.id2 else 1 end
and    case when t1.id2 = 2 then t1.id2 else 1 end = case when t2.id2 = 2 then t2.id2 else 1 end
--
and    t3.id  = t1.id
and    t3.id2 = case when t1.id > t2.id then t1.id else t2.id end
;
There are two important aspects to this query and the plan: First, the join expression (without corresponding expression statistics) between T1 and T2 is sufficiently deceptive to hide from the optimizer that in fact this produces a cartesian product (mimicking real life multi table join expressions that lead to bad estimates) and second, the table T3 is joined to both T1 and an expression based on T1 and T2, which means that this expression can only be evaluated after the join to T1 and T2.
With the execution plan shape enforced via my hints (but could be a real life execution plan shape preferred by the optimizer) T3 and T1 are joined first, producing an innocent 10K rows row source, which is then joined to T2. And here the accident happens inside the hash join operation:

If you look closely at the predicate section you'll notice that the hash join operation has both, an ACCESS operation and a FILTER operation. The ACCESS operation performs based on the join between T1 and T2 a lookup into the hash table, which happens to be a cartesian product, so produces 10K times 10K rows, and only afterwards the FILTER (representing the T3 to T1/T2 join expression) is applied to these 100M rows, but matching only a single row in my example here, which is what the A-Rows shows for this operation.

So the point is that this excessive work and FILTER throwaway isn't very well represented in the row source statistics. Ideally you would need one of the following two modifications to get a better picture of what is going on:

Either the FILTER operator should be a separate step in the plan, which in theory would then look like this:
---------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows|
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   132 (100)|          |    1  |    1  |
|*  1a|  FILTER              |      | 10000 |  1191K|   132   (0)| 00:00:01 |    1  |    1  |
|*  1b|   HASH JOIN          |      | 10000 |  1191K|   132   (0)| 00:00:01 |    1  |  100M |
|   2 |    TABLE ACCESS FULL | T2   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |
|*  3 |    HASH JOIN         |      | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K |
|   4 |     TABLE ACCESS FULL| T3   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |
|   5 |     TABLE ACCESS FULL| T1   | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1a- filter("T3"."ID2"=CASE  WHEN ("T1"."ID">"T2"."ID") THEN
              "T1"."ID" ELSE "T2"."ID" END )
   1b- access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END =CASE
              "T2"."ID2" WHEN 1 THEN "T2"."ID2" ELSE 1 END  AND CASE "T1"."ID2" WHEN
              2 THEN "T1"."ID2" ELSE 1 END =CASE "T2"."ID2" WHEN 2 THEN "T2"."ID2"
              ELSE 1 END )
   3 - access("T3"."ID"="T1"."ID")
Which would make the excess rows produced by the ACCESS part of the hash join very obvious, but is probably for performance reasons not a good solution, because then the data would have to flow from one operation to another one rather than being processed within the HASH JOIN operator, which means increased overhead.

Or an additional rowsource statistics should be made available:
----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows|AE-Rows|
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |   132 (100)|          |    1  |    1  |    1  |
|*  1 |  HASH JOIN          |      | 10000 |  1191K|   132   (0)| 00:00:01 |    1  |    1  |  100M |
|   2 |   TABLE ACCESS FULL | T2   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |   10K |
|*  3 |   HASH JOIN         |      | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K |   10K |
|   4 |    TABLE ACCESS FULL| T3   | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |   10K |
|   5 |    TABLE ACCESS FULL| T1   | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |   10K |
----------------------------------------------------------------------------------------------------
Which I called here "Actually evaluated rows" and in addition to this case here of combined ACCESS and FILTER operations could also be helpful for other FILTER cases, for example even for simple full table scan to see how many rows were evaluated, and not only how many rows matched a possible filter (what A-Rows currently shows).

In a recent OTN thread this topic came up again, and since I also came across this phenomenon a couple of times recently I thought to put this note together. Note that Martin Preiss has submitted a corresponding database idea on the OTN forum.

Expanding on this idea a bit further, it could be useful to have an additional "Estimated evaluated rows (EE-Rows)" calculated by the optimizer and shown in the plan. This could also be used to improve the optimizer's cost model for such cases, because at present it looks like the optimizer doesn't consider additional FILTER predicates on top of ACCESS predicates when calculating the CPU cost of operations like HASH JOINs.

Note that this problem isn't specific to HASH JOIN operations, you can get similar effects with other join methods, like NESTED LOOP joins, or even simple INDEX lookup operations, where again the ACCESS part isn't very selective but only the FILTER applied afterwards filters matching rows.

Here are some examples with the given setup:
select /*+
           leading(t1 t3 t2)
           full(t1)
           full(t3)
           use_hash(t3)
           swap_join_inputs(t3)
           index(t2)
           use_nl(t2)
       */
       t1.*
     --, t3.id2
     --, case when t1.id > t2.id then t1.id else t2.id end
from
       t1
     , t2
     , t3
where
       1 = 1
--
and    case when t1.id2 = 1 then t1.id2 else 1 end = case when t2.id2 = 1 then t2.id2 else 1 end
and    case when t1.id2 = 2 then t1.id2 else 1 end = case when t2.id2 = 2 then t2.id2 else 1 end
--
and    t3.id  = t1.id
and    t3.id2 = case when t1.id > t2.id then t1.id else t2.id end
;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Ord | Operation           | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows| PGA   | Start | Dur(T)| Dur(A)| Time Active Graph    | Activity Graph ASH          | Top 5 Activity ASH  |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     |   6 | SELECT STATEMENT    |        |       |       | 10090 (100)|          |    1  |    1  |       |       |       |       |                      |                             |                     |
|   1 |   0 |   5 |  NESTED LOOPS       |        | 10000 |  1416K| 10090   (1)| 00:00:01 |    1  |    1  |       |       |       |       |                      |                             |                     |
|*  2 |   1 |   3 |   HASH JOIN         |        | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K | 1890K |       |       |       |                      |                             |                     |
|   3 |   2 |   1 |    TABLE ACCESS FULL| T3     | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|   4 |   2 |   2 |    TABLE ACCESS FULL| T1     | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|*  5 |   1 |   4 |   INDEX RANGE SCAN  | T2_IDX |     1 |    30 |     1   (0)| 00:00:01 |   10K |    1  |       |     3 |    33 |    32 |  ################### | @@@@@@@@@@@@@@@@@@@ (100%)  | ON CPU(32)          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T3"."ID"="T1"."ID")
   5 - access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END
              ="T2"."SYS_NC00004$" AND CASE "T1"."ID2" WHEN 2 THEN "T1"."ID2" ELSE 1
              END ="T2"."SYS_NC00005$")
       filter("T3"."ID2"=CASE  WHEN ("T1"."ID">"T2"."ID") THEN
              "T1"."ID" ELSE "T2"."ID" END )

select /*+
           leading(t1 t3 t2)
           full(t1)
           full(t3)
           use_hash(t3)
           swap_join_inputs(t3)
           index(t2)
           use_nl(t2)
       */
       max(t1.filler)
     --, t3.id2
     --, case when t1.id > t2.id then t1.id else t2.id end
from
       t1
     , t2
     , t3
where
       1 = 1
--
and    case when t1.id2 = 1 then t1.id2 else 1 end = case when t2.id2 = 1 then t2.id2 else 1 end
and    case when t1.id2 = 2 then t1.id2 else 1 end = case when t2.id2 = 2 then t2.id2 else 1 end
--
and    t3.id  = t1.id
and    t2.filler >= t1.filler
and    t2.id = case when t1.id2 > t3.id2 then t1.id2 else t3.id2 end
;

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Pid | Ord | Operation            | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Execs | A-Rows| PGA   | Start | Dur(T)| Dur(A)| Time Active Graph    | Activity Graph ASH          | Top 5 Activity ASH  |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 |     |   7 | SELECT STATEMENT     |        |       |       | 20092 (100)|          |    1  |    1  |       |       |       |       |                      |                             |                     |
|   1 |   0 |   6 |  SORT AGGREGATE      |        |     1 |   223 |            |          |    1  |    1  |       |       |       |       |                      |                             |                     |
|   2 |   1 |   5 |   NESTED LOOPS       |        |     1 |   223 | 20092   (1)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|*  3 |   2 |   3 |    HASH JOIN         |        | 10000 |  1123K|    88   (0)| 00:00:01 |    1  |   10K | 1900K |       |       |       |                      |                             |                     |
|   4 |   3 |   1 |     TABLE ACCESS FULL| T3     | 10000 | 70000 |    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|   5 |   3 |   2 |     TABLE ACCESS FULL| T1     | 10000 |  1054K|    44   (0)| 00:00:01 |    1  |   10K |       |       |       |       |                      |                             |                     |
|*  6 |   2 |   4 |    INDEX RANGE SCAN  | T2_IDX |     1 |   108 |     2   (0)| 00:00:01 |   10K |   10K |       |     2 |    34 |    34 | #################### | @@@@@@@@@@@@@@@@@@@ (100%)  | ON CPU(34)          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T3"."ID"="T1"."ID")
   6 - access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END
              ="T2"."SYS_NC00004$" AND CASE "T1"."ID2" WHEN 2 THEN "T1"."ID2" ELSE 1
              END ="T2"."SYS_NC00005$" AND "T2"."FILLER">="T1"."FILLER" AND
              "T2"."ID"=CASE  WHEN ("T1"."ID2">"T3"."ID2") THEN "T1"."ID2" ELSE
              "T3"."ID2" END  AND "T2"."FILLER" IS NOT NULL)
       filter("T2"."ID"=CASE  WHEN ("T1"."ID2">"T3"."ID2") THEN
              "T1"."ID2" ELSE "T3"."ID2" END )
The former one exhibits exactly the same problem as the HASH JOIN example, only that the FILTER is evaluated in the inner row source of a NESTED LOOP join after the index access operation.

The latter one shows as variation the classic partial "index access" due to a range comparison in between - although the entire expression can be evaluated on index level, the access part matches every index entry, so the range scan actually needs to walk the entire index at each loop iteration and the FILTER is then applied to all the index values evaluated.

Thursday, January 22, 2015

New Version Of XPLAN_ASH Utility - In-Memory Support

A new version 4.21 of the XPLAN_ASH utility is available for download. I publish this version because it will be used in the recent video tutorials explaining the Active Session History functionality of the script.

As usual the latest version can be downloaded here.

This is mainly a maintenance release that fixes some incompatibilities of the 4.2 version with less recent versions (10.2 and 11.2.0.1).

As an extra however, this version now differentiates between general CPU usage and in-memory CPU usage (similar to 12.1.0.2 Real-Time SQL Monitoring). This is not done in all possible sections of the output yet, but the most important ones are already covered.

So if you already use the 12.1.0.2 in-memory option this might be helpful to understand how much of your CPU time is spent on in-memory operations vs. non in-memory. Depending on your query profile you might be surprised by the results.

Here are the notes from the change log:

 - Forgot to address a minor issue where the SET_COUNT determined per DFO_TREE (either one or two slave sets) is incorrect in the special case of DFO trees having only S->P distributions (pre-12c style). Previous versions used a SET_COUNT of 2 in such a case which is incorrect, since there is only one slave set. 12c changes this behaviour with the new PX SELECTOR operator and requires again two sets.

- For RAC Cross Instance Parallel Execution specific output some formatting and readability was improved (more linebreaks etc.)

- Minor SQL issue fixed in "SQL statement execution ASH Summary" that prevented execution in 10.2 (ORA-32035)

- The NO_STATEMENT_QUEUING hint prevented the "OPTIMIZER_FEATURES_ENABLE" hint from being recognized, therefore some queries failed in 11.2.0.1 again with ORA-03113. Fixed

- "ON CPU" now distinguishes between "ON CPU INMEMORY" and "ON CPU" for in-memory scans

Sunday, January 11, 2015

Video Tutorial: XPLAN_ASH Active Session History - Introduction

I finally got around preparing another part of the XPLAN_ASH video tutorial.

This part is about the main funcationality of XPLAN_ASH: SQL statement execution analysis using Active Session History and Real-Time SQL Monitoring.

In this video tutorial I'll explain what the output of XPLAN_ASH is supposed to mean when using the Active Session History functionality of the script. Before diving into the details of the script output using sample reports I provide some overview and introduction in this part that hopefully makes it simpler to understand how the output is organized and what it is supposed to mean.

This is the initial, general introduction part. More parts to follow.

Monday, January 5, 2015

Unnecessary BUFFER SORT Operations - Parallel Concatenation Transformation

When using Parallel Execution, depending on the plan shape and the operations used, Oracle sometimes needs to turn non-blocking operations into blocking operations, which means in this case that the row source no longer passes its output data directly to the parent operation but buffers some data temporarily in PGA memory / TEMP. This is either accomplished via the special HASH JOIN BUFFERED operation, or simply by adding BUFFER SORT operations to the plan.

The reason for such a behaviour in parallel plans is the limitation of Oracle Parallel Execution that allows only a single data redistribution to be active concurrently. You can read more about that here.

However, sometimes the optimizer adds unnecessary BUFFER SORT operations to parallel execution plans, and one of the most obvious examples is when the so called "concatenation" query transformation is applied by the optimizer and Parallel Execution is involved.

UPDATE Please note: As mentioned below by Martin (thanks) what I call here "concatenation transformation" typically is called "OR expansion transformation" in CBO speak, and this term probably much better describes what the transformation is about. So whenever I wrote here "concatenation transformation" this can be substituted with "OR expansion transformation".

To understand the issue, first of all, what is the concatenation transformation about?

Whenever there are predicates combined with OR there is the possibility to rewrite the different conditions as separate queries unioned together.

In order to ensure that the result of the rewritten query doesn't contain any unwanted duplicates, the different branches of the UNIONed statement need to filter out any data fulfillinh conditions of previous branches - this is probably where originally the (at first sight) odd (and in the meanwhile documented) LNNVL function came into existence.

The predicates can be either single-table filters, where the concatenation might open up different access paths to the same table (like different indexes), or it might be predicates combining multiple tables, like joins or subqueries.

Here is a short example of the latter (the parallel hints are commented out but are used in the further examples to demonstrate the issue with Parallel Execution) - using version 12.1.0.2:

select 
       max(id) 
from 
(
  select /* parallel(t1 8) parallel(t2 8) */
         t2.* 
  from 
         t1
       , t2
  where 
         (t1.id = t2.id or t1.id = t2.id2)
);
In this example the join condition using an OR prevents any efficient join method between T1 and T2 when not re-writing the statement - Oracle can only resort to a NESTED LOOP join with a repeated full table scan of one of the tables, which is reflected in a rather high estimated cost:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    16 |  2177M  (2)| 23:37:34 |
|   1 |  SORT AGGREGATE     |      |     1 |    16 |            |          |
|   2 |   NESTED LOOPS      |      |  3999K|    61M|  2177M  (2)| 23:37:34 |
|   3 |    TABLE ACCESS FULL| T2   |  2000K|    19M|  1087   (1)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T1   |     2 |    12 |  1089   (2)| 00:00:01 |
----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter("T1"."ID"="T2"."ID" OR "T1"."ID"="T2"."ID2")
The same statement could be expressed by the following manual rewrite:

select max(id) from (
  select /* parallel(t1 8) parallel(t2 8) */
         t2.* 
  from 
         t1
       , t2
  where 
         t1.id = t2.id2
  ---------
  union all
  ---------
  select /* parallel(t1 8) parallel(t2 8) */
         t2.* 
  from 
         t1
       , t2
  where 
         t1.id = t2.id
  and    lnnvl(t1.id = t2.id2)
);
Notice the LNNVL function in the second branch of the UNION ALL that filters out any rows fulfilling the condition used in the first branch.

Also note that using UNION instead of UNION ALL plus LNNVL(s) to filter out any duplicate rows is also potentially incorrect as each query branch might produce duplicate rows that need to be retained as they are also part of the original query result.

At the expense of visiting the tables multiple times we now get at least efficient join methods in each branch (and hence a significantly lower cost estimate):

--------------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    13 |       | 11945   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE       |      |     1 |    13 |       |            |          |
|   2 |   VIEW                |      |  2100K|    26M|       | 11945   (1)| 00:00:01 |
|   3 |    UNION-ALL          |      |       |       |       |            |          |
|*  4 |     HASH JOIN         |      |  2000K|    30M|    34M|  5972   (1)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| T1   |  2000K|    11M|       |  1086   (1)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| T2   |  2000K|    19M|       |  1087   (1)| 00:00:01 |
|*  7 |     HASH JOIN         |      |   100K|  1562K|    34M|  5972   (1)| 00:00:01 |
|   8 |      TABLE ACCESS FULL| T1   |  2000K|    11M|       |  1086   (1)| 00:00:01 |
|   9 |      TABLE ACCESS FULL| T2   |  2000K|    19M|       |  1087   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("T1"."ID"="T2"."ID2")
   7 - access("T1"."ID"="T2"."ID")
       filter(LNNVL("T1"."ID"="T2"."ID2"))
And in fact, when not preventing the concatenation transformation (NO_EXPAND hint), the optimizer comes up with the following execution plan for the original statement:

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    16 |       | 11945   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE      |      |     1 |    16 |       |            |          |
|   2 |   CONCATENATION      |      |       |       |       |            |          |
|*  3 |    HASH JOIN         |      |  2000K|    30M|    34M|  5972   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |  2000K|    11M|       |  1086   (1)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| T2   |  2000K|    19M|       |  1087   (1)| 00:00:01 |
|*  6 |    HASH JOIN         |      |   100K|  1562K|    34M|  5972   (1)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| T1   |  2000K|    11M|       |  1086   (1)| 00:00:01 |
|   8 |     TABLE ACCESS FULL| T2   |  2000K|    19M|       |  1087   (1)| 00:00:01 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("T1"."ID"="T2"."ID2")
   6 - access("T1"."ID"="T2"."ID")
       filter(LNNVL("T1"."ID"="T2"."ID2"))
The only difference between those two plans for the manual and automatic rewrite is the CONCATENATION operator instead of UNION ALL, and that the subquery isn't merged in case of the UNION ALL (additional VIEW operator).

So far everything works as expected and you have seen the effect and rationale of the concatenation transformation.

If we run now the original statement using Parallel Execution (turn comments into hints), depending on the exact version used the resulting execution plans show various inefficiencies.

For reference, this is the parallel execution plan I get from 12.1.0.2 when using above manual rewrite:

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |     1 |    13 |   606   (2)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE                |          |     1 |    13 |            |          |        |      |            |
|   2 |   PX COORDINATOR               |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)         | :TQ10004 |     1 |    13 |            |          |  Q1,04 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE             |          |     1 |    13 |            |          |  Q1,04 | PCWP |            |
|   5 |      VIEW                      |          |  2100K|    26M|   606   (2)| 00:00:01 |  Q1,04 | PCWP |            |
|   6 |       UNION-ALL                |          |       |       |            |          |  Q1,04 | PCWP |            |
|*  7 |        HASH JOIN               |          |  2000K|    30M|   303   (2)| 00:00:01 |  Q1,04 | PCWP |            |
|   8 |         PX RECEIVE             |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,04 | PCWP |            |
|   9 |          PX SEND HYBRID HASH   | :TQ10000 |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | P->P | HYBRID HASH|
|  10 |           STATISTICS COLLECTOR |          |       |       |            |          |  Q1,00 | PCWC |            |
|  11 |            PX BLOCK ITERATOR   |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|  12 |             TABLE ACCESS FULL  | T1       |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|  13 |         PX RECEIVE             |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,04 | PCWP |            |
|  14 |          PX SEND HYBRID HASH   | :TQ10001 |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | P->P | HYBRID HASH|
|  15 |           PX BLOCK ITERATOR    |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | PCWC |            |
|  16 |            TABLE ACCESS FULL   | T2       |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | PCWP |            |
|* 17 |        HASH JOIN               |          |   100K|  1562K|   303   (2)| 00:00:01 |  Q1,04 | PCWP |            |
|  18 |         PX RECEIVE             |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,04 | PCWP |            |
|  19 |          PX SEND HYBRID HASH   | :TQ10002 |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,02 | P->P | HYBRID HASH|
|  20 |           STATISTICS COLLECTOR |          |       |       |            |          |  Q1,02 | PCWC |            |
|  21 |            PX BLOCK ITERATOR   |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,02 | PCWC |            |
|  22 |             TABLE ACCESS FULL  | T1       |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,02 | PCWP |            |
|  23 |         PX RECEIVE             |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,04 | PCWP |            |
|  24 |          PX SEND HYBRID HASH   | :TQ10003 |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,03 | P->P | HYBRID HASH|
|  25 |           PX BLOCK ITERATOR    |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,03 | PCWC |            |
|  26 |            TABLE ACCESS FULL   | T2       |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,03 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   7 - access("T1"."ID"="T2"."ID2")
  17 - access("T1"."ID"="T2"."ID")
       filter(LNNVL("T1"."ID"="T2"."ID2"))
This is a pretty straightforward parallel plan, with the only possibly noteable exception of the new 12c "HYBRID HASH" distribution feature being used.

Now let's have a look at the resulting execution plan when the concatenation transformation gets used:

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |     1 |    16 |   606   (2)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE                      |          |     1 |    16 |            |          |        |      |            |
|   2 |   PX COORDINATOR                     |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)               | :TQ20003 |     1 |    16 |            |          |  Q2,03 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                   |          |     1 |    16 |            |          |  Q2,03 | PCWP |            |
|   5 |      CONCATENATION                   |          |       |       |            |          |  Q2,03 | PCWP |            |
|*  6 |       HASH JOIN                      |          |  2000K|    30M|   303   (2)| 00:00:01 |  Q2,03 | PCWP |            |
|   7 |        PX RECEIVE                    |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q2,03 | PCWP |            |
|   8 |         PX SEND HYBRID HASH          | :TQ20001 |  2000K|    11M|   151   (1)| 00:00:01 |  Q2,01 | P->P | HYBRID HASH|
|   9 |          STATISTICS COLLECTOR        |          |       |       |            |          |  Q2,01 | PCWC |            |
|  10 |           BUFFER SORT                |          |     1 |    16 |            |          |  Q2,01 | PCWP |            |
|  11 |            PX BLOCK ITERATOR         |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q2,01 | PCWC |            |
|  12 |             TABLE ACCESS FULL        | T1       |  2000K|    11M|   151   (1)| 00:00:01 |  Q2,01 | PCWP |            |
|  13 |        PX RECEIVE                    |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q2,03 | PCWP |            |
|  14 |         PX SEND HYBRID HASH          | :TQ20002 |  2000K|    19M|   151   (1)| 00:00:01 |  Q2,02 | P->P | HYBRID HASH|
|  15 |          BUFFER SORT                 |          |     1 |    16 |            |          |  Q2,02 | PCWP |            |
|  16 |           PX BLOCK ITERATOR          |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q2,02 | PCWC |            |
|  17 |            TABLE ACCESS FULL         | T2       |  2000K|    19M|   151   (1)| 00:00:01 |  Q2,02 | PCWP |            |
|  18 |       BUFFER SORT                    |          |       |       |            |          |  Q2,03 | PCWC |            |
|  19 |        PX RECEIVE                    |          |   100K|  1562K|   303   (2)| 00:00:01 |  Q2,03 | PCWP |            |
|  20 |         PX SEND ROUND-ROBIN          | :TQ20000 |   100K|  1562K|   303   (2)| 00:00:01 |        | S->P | RND-ROBIN  |
|  21 |          BUFFER SORT                 |          |     1 |    16 |            |          |        |      |            |
|  22 |           PX COORDINATOR             |          |       |       |            |          |        |      |            |
|  23 |            PX SEND QC (RANDOM)       | :TQ10002 |   100K|  1562K|   303   (2)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |
|  24 |             BUFFER SORT              |          |     1 |    16 |            |          |  Q1,02 | PCWP |            |
|* 25 |              HASH JOIN BUFFERED      |          |   100K|  1562K|   303   (2)| 00:00:01 |  Q1,02 | PCWP |            |
|  26 |               PX RECEIVE             |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,02 | PCWP |            |
|  27 |                PX SEND HYBRID HASH   | :TQ10000 |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | P->P | HYBRID HASH|
|  28 |                 STATISTICS COLLECTOR |          |       |       |            |          |  Q1,00 | PCWC |            |
|  29 |                  BUFFER SORT         |          |     1 |    16 |            |          |  Q1,00 | PCWP |            |
|  30 |                   PX BLOCK ITERATOR  |          |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|  31 |                    TABLE ACCESS FULL | T1       |  2000K|    11M|   151   (1)| 00:00:01 |  Q1,00 | PCWP |            |
|  32 |               PX RECEIVE             |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,02 | PCWP |            |
|  33 |                PX SEND HYBRID HASH   | :TQ10001 |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | P->P | HYBRID HASH|
|  34 |                 BUFFER SORT          |          |     1 |    16 |            |          |  Q1,01 | PCWP |            |
|  35 |                  PX BLOCK ITERATOR   |          |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | PCWC |            |
|  36 |                   TABLE ACCESS FULL  | T2       |  2000K|    19M|   151   (1)| 00:00:01 |  Q1,01 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   6 - access("T1"."ID"="T2"."ID2")
  25 - access("T1"."ID"="T2"."ID")
       filter(LNNVL("T1"."ID"="T2"."ID2"))
This looks a bit weird, and when comparing it to the plan gotten from the manual rewrite, it shows the following unnecessary differences:

- There are various BUFFER SORT operations that don't make a lot of sense, for example each parallel table scan is followed by a BUFFER SORT operation, and even the HASH JOIN BUFFERED in the lower part of the plan is followed by a BUFFER SORT (double buffering?)

- The plan is decomposed into two so called DFO trees, which you can see for example from the two PX COORDINATOR operators (operation id 2 and 22), which adds another unnecessary serial execution part to the plan and can have additional side effects I explain in one of my video tutorials.

This means that such execution plan shapes possibly will have a much higher demand for PGA memory than necessary (the BUFFER SORT operation will attempt to keep the data produced by the child row source in PGA), and also might cause additional I/O to and from TEMP. Since PGA memory consumed by one session influences also the Auto PGA allocation of other sessions this means that such executions not only affect the particular SQL execution in question but also any other concurrent executions allocating PGA memory.

Depending on the amount of data to be buffered BUFFER SORT operations closer to the root of the execution plan are more likely to have significant impact performance-wise, as they might have to buffer large amounts of data.

One very obvious sign of inefficiency are double BUFFERing operations, like a HASH JOIN BUFFERED followed by a BUFFER SORT as parent operation, which you can spot in the sample plan shown above.

Another interesting point is that the parallel plans differ from point release to point release and show different levels of inefficiencies, for example, 10.2.0.5, 11.1.0.7 and 11.2.0.1 produce different plans than 11.2.0.2, which is again different from what 11.2.0.3 & 11.2.0.4 produce - and using OPTIMIZER_FEATURES_ENABLE in newer versions to emulate older versions doesn't always reproduce the exact plans produced by the actual, older versions. So all in all this looks like a pretty messy part of the optimizer.

Furthermore the problem doesn't always show up - it seems to depend largely on the exact version and the plan shape used. For example, replacing the SELECT MAX(ID) FROM () outermost query in above example with a simple SELECT ID FROM () results in a plan where the concatenation transformation doesn't produce all those strange BUFFER SORTS - although it still produces a plan decomposed into two DFO trees in some versions.

It also interesting to note that depending on version and plan shape sometimes the manual rewrite using UNION ALL is also affected by either unluckily placed or unnecessary BUFFER SORT operations, but not to the same extent as the plans resulting from the CONCATENATION transformation.

In the next post I'll show how this inefficiency can have some interesting side effects when being triggered by / combined with other transformations.

Footnote


Table structures used in the test cases:

create table t1
compress
as
select
        (rownum * 2) + 1 as id
      , mod(rownum, 2000) + 1 as id2
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(100000) */ * from dual
connect by
        level <= 100000) a, (select /*+ cardinality(20) */ * from dual connect by level <= 20) b
;

exec dbms_stats.gather_table_stats(null, 't1')

create table t2
compress
as
select * from t1;

exec dbms_stats.gather_table_stats(null, 't2')