Catchy title... Let's assume the following data setup:
create table t1
as
select
rownum as id
, 1 as id2
, rpad('x', 100) as filler
from
dual
connect by
level <= 1e4
;
create table t2
as
select
rownum as id
, 1 as id2
, rpad('x', 100) as filler
from
dual
connect by
level <= 1e4
;
create table t3
as
select
rownum as id
, 1 as id2
, rpad('x', 100) as filler
from
dual
connect by
level <= 1e4
;
exec dbms_stats.gather_table_stats(null, 't1')
exec dbms_stats.gather_table_stats(null, 't2')
exec dbms_stats.gather_table_stats(null, 't3')
-- Deliberately wrong order (FBI after gather stats) - the virtual columns created for this FBI don't have statistics, see below
create index t2_idx on t2 (case when id2 = 1 then id2 else 1 end, case when id2 = 2 then id2 else 1 end, filler, id);
create index t3_idx on t3 (id, filler, id2);
And the following execution plan (all results are from 12.1.0.2 but should be applicable to other versions, too):
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 1416K| 132 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 10000 | 1416K| 132 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | T2 | 10000 | 292K| 44 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 10000 | 1123K| 88 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T3 | 10000 | 70000 | 44 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL| T1 | 10000 | 1054K| 44 (0)| 00:00:01 |
----------------------------------------------------------------------------
How long would you expect it to run to return all rows (no tricks like expensive regular expressions or user-defined PL/SQL functions)?
Probably should take just a blink, given the tiny tables with just 10000 rows each.
However, these are the runtime statistics for a corresponding execution:
| | | |
| |DATABASE |CPU |
|DURATION |TIME |TIME |
|------------|------------|------------|
|+0 00:00:23 |+0 00:00:23 |+0 00:00:23 |
| | | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Execs | A-Rows| PGA | Start | Dur(T)| Dur(A)| Time Active Graph | Activity Graph ASH | Top 5 Activity ASH |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 6 | SELECT STATEMENT | | | | 132 (100)| | 1 | 1 | 0 | | | | | | |
|* 1 | 0 | 5 | HASH JOIN | | 10000 | 1191K| 132 (0)| 00:00:01 | 1 | 1 | 1401K | 2 | 23 | 22 | ##### ############## | @@@@@@@@@@@@@@@@@@@ (100%) | ON CPU(22) |
| 2 | 1 | 1 | TABLE ACCESS FULL | T2 | 10000 | 70000 | 44 (0)| 00:00:01 | 1 | 10K | | | | | | | |
|* 3 | 1 | 4 | HASH JOIN | | 10000 | 1123K| 88 (0)| 00:00:01 | 1 | 10K | 1930K | | | | | | |
| 4 | 3 | 2 | TABLE ACCESS FULL| T3 | 10000 | 70000 | 44 (0)| 00:00:01 | 1 | 10K | | | | | | | |
| 5 | 3 | 3 | TABLE ACCESS FULL| T1 | 10000 | 1054K| 44 (0)| 00:00:01 | 1 | 10K | | | | | | | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
How is it possible to burn more than
20 seconds of CPU time with that execution plan?
The actual rows produced correspond pretty much to the estimated cardinalities (except for the final hash join), so that doesn't look suspect at first glance.
What becomes obvious from the SQL Monitoring output is that all the time is spent on the
hash join operation ID = 1.
Of course at that point (at the latest) you should tell me off for not having you shown the
predicate section of the plan and the corresponding query in first place.
So here is the predicate section and the corresponding query:
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END =CASE
"T2"."ID2" WHEN 1 THEN "T2"."ID2" ELSE 1 END AND CASE "T1"."ID2" WHEN
2 THEN "T1"."ID2" ELSE 1 END =CASE "T2"."ID2" WHEN 2 THEN "T2"."ID2"
ELSE 1 END )
filter("T3"."ID2"=CASE WHEN ("T1"."ID">"T2"."ID") THEN
"T1"."ID" ELSE "T2"."ID" END )
3 - access("T3"."ID"="T1"."ID")
select /*+
leading(t1 t3 t2)
full(t1)
full(t3)
use_hash(t3)
swap_join_inputs(t3)
full(t2)
use_hash(t2)
swap_join_inputs(t2)
*/
t1.*
--, t3.id2
--, case when t1.id > t2.id then t1.id else t2.id end
from
t1
, t2
, t3
where
1 = 1
--
and case when t1.id2 = 1 then t1.id2 else 1 end = case when t2.id2 = 1 then t2.id2 else 1 end
and case when t1.id2 = 2 then t1.id2 else 1 end = case when t2.id2 = 2 then t2.id2 else 1 end
--
and t3.id = t1.id
and t3.id2 = case when t1.id > t2.id then t1.id else t2.id end
;
There are two important aspects to this query and the plan: First, the join expression (without corresponding expression statistics) between T1 and T2 is sufficiently deceptive to hide from the optimizer that in fact this produces a
cartesian product (mimicking real life multi table join expressions that lead to bad estimates) and second, the table T3 is joined to both T1 and an expression based on T1 and T2, which means that this expression can only be evaluated
after the join to T1 and T2.
With the execution plan shape enforced via my hints (but could be a real life execution plan shape preferred by the optimizer) T3 and T1 are joined first, producing an innocent 10K rows row source, which is then joined to T2. And here the accident happens inside the hash join operation:
If you look closely at the predicate section you'll notice that the hash join operation has both, an ACCESS operation
and a FILTER operation. The ACCESS operation performs based on the join between T1 and T2 a lookup into the hash table, which happens to be a cartesian product, so produces 10K times 10K rows, and only afterwards the FILTER (representing the T3 to T1/T2 join expression) is applied to these
100M rows, but matching only a single row in my example here, which is what the A-Rows shows for this operation.
So the point is that this excessive work and FILTER throwaway isn't very well represented in the row source statistics. Ideally you would need one of the following two modifications to get a better picture of what is going on:
Either the FILTER operator should be a
separate step in the plan, which in theory would then look like this:
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Execs | A-Rows|
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 132 (100)| | 1 | 1 |
|* 1a| FILTER | | 10000 | 1191K| 132 (0)| 00:00:01 | 1 | 1 |
|* 1b| HASH JOIN | | 10000 | 1191K| 132 (0)| 00:00:01 | 1 | 100M |
| 2 | TABLE ACCESS FULL | T2 | 10000 | 70000 | 44 (0)| 00:00:01 | 1 | 10K |
|* 3 | HASH JOIN | | 10000 | 1123K| 88 (0)| 00:00:01 | 1 | 10K |
| 4 | TABLE ACCESS FULL| T3 | 10000 | 70000 | 44 (0)| 00:00:01 | 1 | 10K |
| 5 | TABLE ACCESS FULL| T1 | 10000 | 1054K| 44 (0)| 00:00:01 | 1 | 10K |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1a- filter("T3"."ID2"=CASE WHEN ("T1"."ID">"T2"."ID") THEN
"T1"."ID" ELSE "T2"."ID" END )
1b- access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END =CASE
"T2"."ID2" WHEN 1 THEN "T2"."ID2" ELSE 1 END AND CASE "T1"."ID2" WHEN
2 THEN "T1"."ID2" ELSE 1 END =CASE "T2"."ID2" WHEN 2 THEN "T2"."ID2"
ELSE 1 END )
3 - access("T3"."ID"="T1"."ID")
Which would make the excess rows produced by the ACCESS part of the hash join very obvious, but is probably for performance reasons not a good solution, because then the data would have to flow from one operation to another one rather than being processed within the HASH JOIN operator, which means increased overhead.
Or an additional rowsource statistics should be made available:
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Execs | A-Rows|AE-Rows|
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 132 (100)| | 1 | 1 | 1 |
|* 1 | HASH JOIN | | 10000 | 1191K| 132 (0)| 00:00:01 | 1 | 1 | 100M |
| 2 | TABLE ACCESS FULL | T2 | 10000 | 70000 | 44 (0)| 00:00:01 | 1 | 10K | 10K |
|* 3 | HASH JOIN | | 10000 | 1123K| 88 (0)| 00:00:01 | 1 | 10K | 10K |
| 4 | TABLE ACCESS FULL| T3 | 10000 | 70000 | 44 (0)| 00:00:01 | 1 | 10K | 10K |
| 5 | TABLE ACCESS FULL| T1 | 10000 | 1054K| 44 (0)| 00:00:01 | 1 | 10K | 10K |
----------------------------------------------------------------------------------------------------
Which I called here
"Actually evaluated rows" and in addition to this case here of combined
ACCESS and FILTER operations could also be helpful for other FILTER cases, for example even for simple
full table scan to see how many rows were
evaluated, and not only how many rows matched a possible filter (what A-Rows currently shows).
In a
recent OTN thread this topic came up again, and since I also came across this phenomenon a couple of times recently I thought to put this note together. Note that Martin Preiss has submitted a
corresponding database idea on the OTN forum.
Expanding on this idea a bit further, it could be useful to have an additional
"Estimated evaluated rows (EE-Rows)" calculated by the optimizer and shown in the plan. This could also be used to improve the optimizer's cost model for such cases, because at present it looks like the optimizer doesn't consider additional FILTER predicates on top of ACCESS predicates when calculating the CPU cost of operations like HASH JOINs.
Note that this problem isn't specific to
HASH JOIN operations, you can get similar effects with other join methods, like
NESTED LOOP joins, or even simple
INDEX lookup operations, where again the ACCESS part isn't very selective but only the FILTER applied afterwards filters matching rows.
Here are some examples with the given setup:
select /*+
leading(t1 t3 t2)
full(t1)
full(t3)
use_hash(t3)
swap_join_inputs(t3)
index(t2)
use_nl(t2)
*/
t1.*
--, t3.id2
--, case when t1.id > t2.id then t1.id else t2.id end
from
t1
, t2
, t3
where
1 = 1
--
and case when t1.id2 = 1 then t1.id2 else 1 end = case when t2.id2 = 1 then t2.id2 else 1 end
and case when t1.id2 = 2 then t1.id2 else 1 end = case when t2.id2 = 2 then t2.id2 else 1 end
--
and t3.id = t1.id
and t3.id2 = case when t1.id > t2.id then t1.id else t2.id end
;
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Execs | A-Rows| PGA | Start | Dur(T)| Dur(A)| Time Active Graph | Activity Graph ASH | Top 5 Activity ASH |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 6 | SELECT STATEMENT | | | | 10090 (100)| | 1 | 1 | | | | | | | |
| 1 | 0 | 5 | NESTED LOOPS | | 10000 | 1416K| 10090 (1)| 00:00:01 | 1 | 1 | | | | | | | |
|* 2 | 1 | 3 | HASH JOIN | | 10000 | 1123K| 88 (0)| 00:00:01 | 1 | 10K | 1890K | | | | | | |
| 3 | 2 | 1 | TABLE ACCESS FULL| T3 | 10000 | 70000 | 44 (0)| 00:00:01 | 1 | 10K | | | | | | | |
| 4 | 2 | 2 | TABLE ACCESS FULL| T1 | 10000 | 1054K| 44 (0)| 00:00:01 | 1 | 10K | | | | | | | |
|* 5 | 1 | 4 | INDEX RANGE SCAN | T2_IDX | 1 | 30 | 1 (0)| 00:00:01 | 10K | 1 | | 3 | 33 | 32 | ################### | @@@@@@@@@@@@@@@@@@@ (100%) | ON CPU(32) |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T3"."ID"="T1"."ID")
5 - access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END
="T2"."SYS_NC00004$" AND CASE "T1"."ID2" WHEN 2 THEN "T1"."ID2" ELSE 1
END ="T2"."SYS_NC00005$")
filter("T3"."ID2"=CASE WHEN ("T1"."ID">"T2"."ID") THEN
"T1"."ID" ELSE "T2"."ID" END )
select /*+
leading(t1 t3 t2)
full(t1)
full(t3)
use_hash(t3)
swap_join_inputs(t3)
index(t2)
use_nl(t2)
*/
max(t1.filler)
--, t3.id2
--, case when t1.id > t2.id then t1.id else t2.id end
from
t1
, t2
, t3
where
1 = 1
--
and case when t1.id2 = 1 then t1.id2 else 1 end = case when t2.id2 = 1 then t2.id2 else 1 end
and case when t1.id2 = 2 then t1.id2 else 1 end = case when t2.id2 = 2 then t2.id2 else 1 end
--
and t3.id = t1.id
and t2.filler >= t1.filler
and t2.id = case when t1.id2 > t3.id2 then t1.id2 else t3.id2 end
;
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Execs | A-Rows| PGA | Start | Dur(T)| Dur(A)| Time Active Graph | Activity Graph ASH | Top 5 Activity ASH |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 7 | SELECT STATEMENT | | | | 20092 (100)| | 1 | 1 | | | | | | | |
| 1 | 0 | 6 | SORT AGGREGATE | | 1 | 223 | | | 1 | 1 | | | | | | | |
| 2 | 1 | 5 | NESTED LOOPS | | 1 | 223 | 20092 (1)| 00:00:01 | 1 | 10K | | | | | | | |
|* 3 | 2 | 3 | HASH JOIN | | 10000 | 1123K| 88 (0)| 00:00:01 | 1 | 10K | 1900K | | | | | | |
| 4 | 3 | 1 | TABLE ACCESS FULL| T3 | 10000 | 70000 | 44 (0)| 00:00:01 | 1 | 10K | | | | | | | |
| 5 | 3 | 2 | TABLE ACCESS FULL| T1 | 10000 | 1054K| 44 (0)| 00:00:01 | 1 | 10K | | | | | | | |
|* 6 | 2 | 4 | INDEX RANGE SCAN | T2_IDX | 1 | 108 | 2 (0)| 00:00:01 | 10K | 10K | | 2 | 34 | 34 | #################### | @@@@@@@@@@@@@@@@@@@ (100%) | ON CPU(34) |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T3"."ID"="T1"."ID")
6 - access(CASE "T1"."ID2" WHEN 1 THEN "T1"."ID2" ELSE 1 END
="T2"."SYS_NC00004$" AND CASE "T1"."ID2" WHEN 2 THEN "T1"."ID2" ELSE 1
END ="T2"."SYS_NC00005$" AND "T2"."FILLER">="T1"."FILLER" AND
"T2"."ID"=CASE WHEN ("T1"."ID2">"T3"."ID2") THEN "T1"."ID2" ELSE
"T3"."ID2" END AND "T2"."FILLER" IS NOT NULL)
filter("T2"."ID"=CASE WHEN ("T1"."ID2">"T3"."ID2") THEN
"T1"."ID2" ELSE "T3"."ID2" END )
The former one exhibits exactly the same problem as the HASH JOIN example, only that the FILTER is evaluated in the inner row source of a NESTED LOOP join
after the index access operation.
The latter one shows as variation the classic
partial "index access" due to a range comparison in between - although the entire expression can be evaluated on index level, the access part matches every index entry, so the range scan actually needs to walk the
entire index at each loop iteration and the FILTER is then applied to all the index values evaluated.