Monday, May 21, 2012

Nested Loop Join Costing

The basic formula for calculating the costs of a Nested Loop Join is pretty straightforward and has been described and published several times.

In principle it is the cost of acquiring the driving row source plus the cost of acquiring the inner row source of the Nested Loop as many times as the driving row source dictates via the cardinality of the driving row source.

Cost (outer rowsource) + Cost (inner rowsource) * Card (outer rowsource)

Obviously there are cases where Oracle has introduced refinements to the above formula where this is no longer true. Here is one of these cases that is probably not uncommon.

Let's start with a simple two table join that shows above formula in action. It represents a parent-child relationship where the parent table has 10,000 rows with a unique identifier, and a child table with 100 rows each that map to a single parent row, having 1,000,000 rows in total.

set echo on create table t as select rownum as id , rownum as attr1 , rpad('x', 100) as filler from dual connect by level <= 10000 ; exec dbms_stats.gather_table_stats(null, 't') create table t2 as select rownum as id , mod(rownum, 10000) + 1 as fk , mod(rownum, 20) + 1 as attr1 , rpad('x', 100) as filler from dual connect by level <= 1000000 ; exec dbms_stats.gather_table_stats(null, 't2') create index t2_idx on t2 (fk); explain plan for select /*+ use_nl(t t2) leading(t) index(t2) */ * from t , t2 where t.attr1 <= 500 and t2.fk = t.id; set pagesize 0 linesize 200 tab off select * from table(dbms_xplan.display);

which gives the following execution plan in 11.2:

--------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50005 | 10M| 51087 (1)| 00:10:14 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 50005 | 10M| 51087 (1)| 00:10:14 | |* 3 | TABLE ACCESS FULL | T | 500 | 54500 | 45 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | T2_IDX | 100 | | 2 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T2 | 100 | 11300 | 102 (0)| 00:00:02 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T"."ATTR1"<=500) 4 - access("T2"."FK"="T"."ID")

There are a couple of noteworthy comments:

1. T2.FK and T.ID have the same number of distinct keys each (10,000), so assuming corresponding primary and foreign key constraints in place this means that there is at least one matching row for every parent row.

2. There is a filter on T that restricts the driving row source to 500 rows. It is interesting to note that this filter results in a "biased join" where Oracle picks the NUM_DISTINCT from the other table for the join selectivity calculation rather than using the greatest NUM_DISTINCT of both, in this case arriving at a correct cardinality estimate of approx. 50,000

3. The T2_IDX index has a worst-case clustering factor due to the scattering of T2.FK via the MOD function, hence the resulting cost of a single iteration of the Nested Loop join is 100 for picking up 100 rows from the table (plus 2 for the index access)

The overall cost for the loop is 45 for acquiring the driving row source plus 500 times 102, the remaining part being CPU costing overhead. This matches above formula.

Now let's re-create table T2 using a different distribution of the T2.FK column:

drop table t2; purge table t2; create table t2 as select rownum as id , mod(rownum, 500) + 1 as fk , mod(rownum, 20) + 1 as attr1 , rpad('x', 100) as filler from dual connect by level <= 1000000 ; exec dbms_stats.gather_table_stats(null, 't2') create index t2_idx on t2 (fk);

The only difference is now that the FK column no longer has 10,000 distinct keys but only 500.

Of course on average now 2,000 rows in T2 match a parent row in T, but obviously no longer all of them have a match in T2.

Let's review the execution plan for the same SQL statement as above:

--------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 998K| 211M| 52609 (1)| 00:10:32 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 998K| 211M| 52609 (1)| 00:10:32 | |* 3 | TABLE ACCESS FULL | T | 500 | 54500 | 45 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | T2_IDX | 2000 | | 5 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T2 | 1996 | 220K| 2007 (1)| 00:00:25 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T"."ATTR1"<=500) 4 - access("T2"."FK"="T"."ID")

Since now 2,000 rows match a single parent key on average, the cost per iteration has increased accordingly. The index range scan needs to visit some root / branch blocks plus four leaf blocks on average, resulting in a cost of 5. The table access needs to visit 2,000 rows on average, hence with the same worst-case clustering factor the cost per iteration is 2,000 plus CPU overhead.

Given the fact that we need to do this 500 times the final cost of the Nested Loop join ought to be something close to one million, but surprisingly it is only a little bit higher than the cost of the previous scenario.

So clearly the original formula doesn't apply here, although the cost for a single iteration in the execution plan seems to match the expectations.

It looks like Oracle a long way back introduced a refinement to the original formula in the case of less distinct keys of the inner row source join column than the driving row source join column.

The idea behind it seems to be that this is what Oracle calls a "sparse" join. Obviously not every row from the driving row source will find a match in the inner row source, hence some of the loop iterations should end with the index lookup, not finding a match and therefore no need to visit the inner table afterwards. The refinement hence calculates a lower average cost of the table visit per loop iteration.

This is of course true when looking at the join by itself. But if you are unlucky and a corresponding filter on the driving row source turns the "sparse" join into a (rather) "dense" join, then this refinement can lead to a significant cost underestimate of a Nested Loop join, potentially driving the optimizer towards favouring that join method over others.

And it is exactly that scenario what above query simulates: The filter on T results in a full match of all 500 driving rows, and the actual cost of the Nested Loop join ought to be closer to one million than 50,000 in this case.

The refinement to the cost calculation seems to be based on the ratio between the NUM_DISTINCT of two join coin columns: In my example the ratio is 10,000:500, so the overall cost is only approx. 1/20 of the original cost.

There are some further details how the formula deals with a small number of loop iterations. For example, the first iteration will get the full original cost, a number of further iterations (again seems to correspond to the ratio, here for example 20) get only the index cost added, and from then on the costing of the table access levels at the original cost downscaled by the ratio (2000 / 20 which is 100 in this case).

The refinement obviously has been added in release 10.1.0.3 and can be found in the Fix Control list as bug number 3120429. The text for this fix is: "account for join key sparsity in computing NL index access cost" and apparently only applies if the inner row source uses an index access.

This also means that the original costing can be activated by disabling the fix:

explain plan for select /*+ opt_param('_fix_control' '3120429:0') use_nl(t t2) leading(t) index(t2) */ * from t , t2 where t.attr1 <= 500 and t2.fk = t.id; set pagesize 0 linesize 200 tab off select * from table(dbms_xplan.display); --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 998K| 211M| 1003K (1)| 03:20:41 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 998K| 211M| 1003K (1)| 03:20:41 | |* 3 | TABLE ACCESS FULL | T | 500 | 54500 | 45 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | T2_IDX | 2000 | | 5 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T2 | 1996 | 220K| 2007 (1)| 00:00:25 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T"."ATTR1"<=500) 4 - access("T2"."FK"="T"."ID")
So if you happen to have such cases where the filter on a driving row source leads to an underestimate of the Nested Loop join cost as outlined here, using the fix control allows arriving at more reasonable costing figures.

I recently had such a case at a client where only a small part of a fully populated time dimension was used in the foreign key of a fact table, but the filter on the time dimension lead to exactly the scenario described here - the join wasn't really sparse and the Nested Loop join cost was significantly underestimated.