Recently I came across some interesting edge cases regarding the costing of joins. They all have in common that they result in (at first sight) unexpected execution plans, but only some of them are actual threats to performance.
Outer Joins
The first one is about
outer joins with an extreme data distribution. Consider the following data setup:
create table t1
as
select
rownum as id
, rpad('x', 100) as filler
, case when rownum > 1e6 then rownum end as null_fk
from
dual
connect by
level <= 1e6
;
exec dbms_stats.gather_table_stats(null, 't1')
create table t2
as
select
rownum as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1e6
;
exec dbms_stats.gather_table_stats(null, 't2')
create /*unique*/ index t2_idx on t2 (id);
The following query is a simple outer join between T1 and T2 and the default, unhinted execution plan that I get from 11.2.0.1 (11.1.0.7 and 10.2.0.4 show similar results):
select
t1.filler as t1_filler
, t2.filler as t2_filler
from
t1
, t2
where
t1.null_fk = t2.id (+)
;
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 202M| 4204 (1)| 00:00:51 |
| 1 | NESTED LOOPS OUTER | | 1000K| 202M| 4204 (1)| 00:00:51 |
| 2 | TABLE ACCESS FULL | T1 | 1000K| 101M| 4202 (1)| 00:00:51 |
| 3 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 106 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | T2_IDX | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."NULL_FK"="T2"."ID"(+))
The optimizer preferred a
Nested Loop join albeit the fact that the number of estimated loop iterations is pretty large. Notice in particular the cost column: Although the inner rowsource is estimated to be started
1000K times, the cost of doing so corresponds to just a
single execution.
For reference here is a cost estimate for a similar operation that corresponds to the expected costing model:
select /*+ use_nl(t1 t2) */
t1.filler as t1_filler
, t2.filler as t2_filler
from
t1
, t2
where
t1.id = t2.id (+)
;
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 202M| 3006K (1)| 10:01:21 |
| 1 | NESTED LOOPS OUTER | | 1000K| 202M| 3006K (1)| 10:01:21 |
| 2 | TABLE ACCESS FULL | T1 | 1000K| 101M| 4200 (1)| 00:00:51 |
| 3 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 106 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | T2_IDX | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."ID"="T2"."ID"(+))
I now had to force the Nested Loop join via a hint, because by default other join methods were preferred by the optimizer. The cost of a
single iteration of the loop has now increased to 3, although the access is exactly the same as before (T2 random table access via index lookup of T2.ID), and the cost of the Nested Loop join corresponds to the known formula: Estimated starts times the cost of a single iteration, which is 3000K in this case here, plus the 4200 of the Full Table Scan for accessing the outer row source, plus CPU costing overhead.
So what makes the difference between the two? It's the data. The column name chosen for the column in T1 already reveals what's so special: The join column used (NULL_FK) is actually
NULL for all rows.
The optimizer takes this into account and assumes that
none of those rows from the driving row source will actually match a row of the inner row source - in fact the lookup to the inner row source could be short-circuited in some way, since a NULL value by definition isn't supposed to find a match for this join. I haven't investigated to what extent the runtime engine does this, however in the Rowsource Statistics the inner row source is started the expected number of times, although
no logical I/O is recorded for it, but some CPU time, so at least some work seems to be done there.
Modifying the test case so that more of the FKs are actually non-null shows that the cost calculation is scaled accordingly. In fact the cost calculation is more or less the same than that of a corresponding
inner join that could
filter out those driving rows with NULL values in the join column.
The overall performance of the execution plan is quite decent, so although it looks quite unusual it performs pretty well.
In the second part I'll show another interesting, unexpected join execution plan that however can cause real performance problems.