Friday, December 23, 2011

Extended DISPLAY_CURSOR With Rowsource Statistics

Introduction

So this will be my Oracle related Christmas present for you: A prototype implementation that extends the DBMS_XPLAN.DISPLAY_CURSOR output making it hopefully more meaningful and easier to interpret. It is a simple standalone SQL*Plus script with the main functionality performed by a single SQL query. I've demoed this also during my recent "optimizer hacking sessions".

DBMS_XPLAN.DISPLAY_CURSOR together with the Rowsource Statistics feature (enabled via SQL_TRACE, GATHER_PLAN_STATISTICS hint, STATISTICS_LEVEL set to ALL or controlled via the corresponding hidden parameters "_rowsource_execution_statistics" and "_rowsource_statistics_sampfreq") allows since Oracle 10g a sophisticated analysis of the work performed by a single SQL statement.

Of course you'll appreciate that it doesn't go as far as the Real-Time SQL Monitoring feature added in Oracle 11g but only available with Enterprise Edition + Diagnostic + Tuning Pack that is "always on" and provides similar (and much more) information while a statement is executing and doesn't require reproducing the execution with the corresponding hints / parameters set.

It's usually necessary to reproduce the execution without the Tuning Pack because the overhead of the Rowsource Statistics is significant and therefore it doesn't make sense to have them always enabled - unfortunately Oracle 11g gathers the same information "always on" but you're only allowed to access that information if you have the Tuning Pack license.

But for users without the corresponding licenses DBMS_XPLAN.DISPLAY_CURSOR together with Rowsource Statistics is still a very valuable tool.

However during my seminars and consulting at client sites I've realized that people quite often struggle to interpret the output provided for several reasons:

1. They have problems in general to interpret the execution plan - here I refer in particular to the flow of execution and underlying execution mechanics

2. They have problems in identifying the operations that are responsible for the majority of the work due to the cumulative nature of the work-related figures provided like Elapsed Time, Logical I/O, Physical Reads etc.

3. They are potentially mislead when trying to identify those steps in the execution plan that are subject to cardinality mis-estimates of the optimizer - the single most common reason for inefficient execution plans - due to the way the optimizer shows the number of estimated rows for operations that are executed multiple times (for example the inner row source of a Nested Loop join).

I've tried to address all of the above points (and even more) with this prototype implementation. In fact point 1 above has already been addressed by Adrian Billington's XPLAN wrapper utility that adds the Parent ID and Order of Execution to the DBMS_XPLAN output and I've picked up that idea of injecting additional information into the output with this prototype, so kudos to Adrian for his great idea and implementation.

Apart from any home-grown scripts there have probably been numerous attempts to address point 2 und 3 above, the latest one I know of being Kyle Hailey's DISPLAY_CURSOR post and his "TCF query" provided in the same article. I've included his TCF-GRAPH and LIO-RATIO information, so also thanks to Kyle for posting this.

I plan to eventually turn this into a SQL statement analysis "Swiss-army knife" for non-Tuning Pack users with more sophisticated formatting options (for example specifying which columns to show and in which order) and the ability to combine the information with the ASH info available from the Diagnostic Pack license (similar to the output provided by the Real-Time SQL Monitoring text mode).

However I believe that this prototype is already quite helpful and therefore decided to publish it as it is.

Let's have a look what the extended output has to offer by performing a couple of sample Rowsource Profiles.

Examples

The first example is deliberately kept as simple as possible to explain the basic functionality by performing a full table scan.


SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select count(*) from t1;

COUNT(*)
----------
1000000

Elapsed: 00:00:03.01
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID 5bc0v4my7dvr5, child number 0
-------------------------------------
select count(*) from t1

Plan hash value: 3724264953

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 3 | SELECT STATEMENT | | 1 | | 1 |00:00:02.98 | 15390 | 15386 | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 1 | 0 | 2 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:02.98 | 15390 | 15386 | 00:00:01.39 | 0 | 0 | @@@@@@ | | | 0 | | 1 |
| 2 | 1 | 1 | TABLE ACCESS FULL| T1 | 1 | 1000K| 1000K|00:00:01.58 | 15390 | 15386 | 00:00:01.58 | 15390 | 15386 | @@@@@@ | @@@@@@@@@@@@| @@@@@@@@@@@@| 0 | | 1000K|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


14 rows selected.


The first thing that becomes obvious is the fact that you need a veeery wide display setting to see all the columns provided :-)

As you can see if you call the script without any parameters it will try to pick up the last statement executed by the current session and call DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST formatting option. Further options can be found in the documentation provided with the script.

To the left you can see the "Pid" and "Ord" column that Adrian added in his original XPLAN wrapper script - these define the Parent Id as well as the Order of Execution. Note that this Order of Execution is only correct for the common cases - it doesn't cater for the various exceptions to the general rules and therefore can be misleading. You'll find below an example that demonstrates this.

Furthermore you see in addition the following columns to what is provided out of the box by DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST option:

A-Time Self: This is the time spent on the operation itself. For leaf operations this corresponds to the A-Time, but for all non-leaf operations this is the time that was spent on the operation itself obtained by subtracting the time spent on all direct descendant operations from the time shown for the parent operation. Please note that if you use a lower rowsource sample frequency (for example as set by the GATHER_PLAN_STATISTICS hint) the A-Time information will be pretty wrong and misleading. You need to set the sample frequency to 1 to get a stable time information reported - of course this means that the overhead of the rowsource sampling gets maximized

Bufs Self/Reads Self/Write Self: This is the corresponding self-operation statistic obtaining in the same way as just described

Graphs: The self-operation work shown relative to the total work performed. Note that the "total" is defined by querying the MAX value found in the statistics rather than picking the top-most cumulative value. This is because for queries that are cancelled or performed using Parallel Execution the top-most value may either not be populated at all or may be different from the values accumulated by the Parallel Slaves. So there are cases where the Graphs may be wrong and misleading - treat them carefully

LIO Ratio: This the simply the ratio between the number of rows generated by the row source and the number of logical I/O recorded for the particular operation required to generate them. As usual care should be taken when interpreting a ratio, but in general a high value here might indicate that there are more efficient ways to generate the data, like a more selective access path. This can be very misleading for aggregation operations for example - a COUNT(*) will potentially show a huge LIO ratio but doesn't indicate a problem by itself

TCF Graph: "Tuning by Cardinality Feedback" - this is a graph in a different style - it shows either plus or minus signs and each plus / minus corresponds to one order of magnitude difference between the estimated and the actual rows. Plus stands for underestimates, minus for overestimates. So two plus signs indicate that the the actual number of rows was 100 times greater than the estimated number, and similarly two minus signs would indicate an overestimate of factor 100. Note that this information will be partially misleading with Parallel Execution, because an operation that is only started once with serial execution might be started several times to obtain the complete result set when executed in parallel. Also cancelling queries might show misleading information here, see the "E-Rows*Sta" column description for an explanation why.

E-Rows*Sta: This is the estimated number of rows times the Starts column. This tries to address the point 3 above where the simple comparison of E-Rows and A-Rows can be very misleading, however doesn't indicate a problem at all if the operation has been started a corresponding number of times. If a query gets cancelled then this might still indicate a difference between this and A-Rows simply because the operation wasn't run to completion. Also for Parallel Execution this information needs to be carefully treated because an operation executed in parallel will be started many times that gets only executed once with serial execution

Looking at above example graph it becomes obvious that all of the logical and physical I/O has been caused by the full table scan of course, but with the increased STATISTICS_LEVEL setting you can see that the SORT AGGREGATE function also required some time - presumably CPU time due to instrumentation overhead whereas the top-most operation didn't account for any work at all. The cardinality estimate is also spot on.

The next example shows a different shape:


SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );
2.7338E+11

Elapsed: 00:00:41.83
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )

Plan hash value: 1399240396

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp| A-Time Self |Bufs Self |Reads Self|Write Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|Write S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 6 | SELECT STATEMENT | | 1 | | 1 |00:00:41.84 | 1469K| 49356 | 10578 | | | | | 00:00:00.00 | 0 | 0 | 0 | | | | | 0 | | |
| 1 | 0 | 5 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:41.84 | 1469K| 49356 | 10578 | | | | | 00:00:02.19 | 0 | 0 | 0 | @ | | | | 0 | | 1 |
| 2 | 1 | 4 | VIEW | | 1 | 16 | 1466K|00:00:39.64 | 1469K| 49356 | 10578 | | | | | 00:00:04.22 | 0 | 0 | 0 | @ | | | | 0 | ++++ | 16 |
| 3 | 2 | 3 | WINDOW SORT | | 1 | 16 | 1466K|00:00:35.42 | 1469K| 49356 | 10578 | 93M| 3312K| 55M (1)| 84992 | 00:00:11.85 | 6 | 24410 | 10578 | @@@ | | @@@@@@ | @@@@@@@@@@@@| 0 | ++++ | 16 |
| 4 | 3 | 2 | TABLE ACCESS BY INDEX ROWID| T | 1 | 16 | 1466K|00:00:23.58 | 1469K| 24946 | 0 | | | | | 00:00:19.73 | 1466K| 21707 | 0 | @@@@@@ | @@@@@@@@@@@@| @@@@@ | | 1 | ++++ | 16 |
|* 5 | 4 | 1 | INDEX RANGE SCAN | I | 1 | 16 | 1466K|00:00:03.85 | 3240 | 3239 | 0 | | | | | 00:00:03.85 | 3240 | 3239 | 0 | @ | | @ | | 0 | ++++ | 16 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("OBJECT_ID">:X)


25 rows selected.


Have I already mentioned that you need a veery wide display setting :-) ??

Anyway here we can see a couple of interesting points:

- An example of a parent operation requiring a significant amount of time - in this case a WINDOW SORT operation that spills to disk (see the Used-Tmp and Writes columns)

- A problem with the cardinality estimates as indicated by the TCF Graph. In this case it is the reason for an inefficient index-based access path. Note that the LIO Ratio isn't indicating this problem here very clearly

- The majority of the logical I/O (and time and work) is caused by the random access to the table, again caused by the bad choice of the optimizer due to the wrong cardinality estimates

Here is another example of a more complex execution plan:


SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set star_transformation_enabled = temp_disable;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select * from (
2 select t1.id as t1_id, t1.filler, s.id as s_id from t1, (
3 select
4 f.id
5 from
6 t f
7 , (select * from d where is_flag_d1 = 'Y') d1
8 , (select * from d where is_flag_d2 = 'Y') d2
9 , (select * from d where is_flag_d3 = 'Y') d3
10 where
11 f.fk1 = d1.id
12 and f.fk2 = d2.id
13 and f.fk3 = d3.id
14 ) s
15 where t1.id = s.id
16 )
17 where rownum > 1
18 ;

no rows selected

Elapsed: 00:00:21.26
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID 5u3x96k4s5zt6, child number 0
-------------------------------------
select * from ( select t1.id as t1_id, t1.filler, s.id as s_id from t1,
( select f.id from t f , (select * from d where
is_flag_d1 = 'Y') d1 , (select * from d where is_flag_d2 = 'Y') d2
, (select * from d where is_flag_d3 = 'Y') d3 where f.fk1 =
d1.id and f.fk2 = d2.id and f.fk3 = d3.id ) s where t1.id = s.id
) where rownum > 1

Plan hash value: 42027304

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 30 | SELECT STATEMENT | | 1 | | 0 |00:00:21.23 | 2161K| 43798 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
| 1 | 0 | 29 | COUNT | | 1 | | 0 |00:00:21.23 | 2161K| 43798 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
|* 2 | 1 | 28 | FILTER | | 1 | | 0 |00:00:21.23 | 2161K| 43798 | | | | 00:00:00.44 | 0 | 0 | | | | | | |
| 3 | 2 | 27 | NESTED LOOPS | | 1 | | 1000K|00:00:20.79 | 2161K| 43798 | | | | 00:00:00.44 | 0 | 0 | | | | 0 | | |
| 4 | 3 | 25 | NESTED LOOPS | | 1 | 9 | 1000K|00:00:11.10 | 2131K| 21150 | | | | 00:00:00.41 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 5 | 4 | 23 | HASH JOIN | | 1 | 9 | 1000K|00:00:06.12 | 19549 | 17970 | 33M| 6589K| 65M (0)| 00:00:00.59 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 6 | 5 | 21 | HASH JOIN | | 1 | 9 | 1000K|00:00:05.53 | 19385 | 17970 | 37M| 6044K| 69M (0)| 00:00:00.57 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 7 | 6 | 19 | HASH JOIN | | 1 | 10 | 1000K|00:00:04.95 | 19221 | 17970 | 1452K| 1452K| 1002K (0)| 00:00:00.53 | 0 | 0 | | | | 0 | +++++ | 10 |
|* 8 | 7 | 1 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
| 9 | 7 | 18 | VIEW | VW_ST_84A34AF1 | 1 | 10 | 1000K|00:00:04.42 | 19057 | 17970 | | | | 00:00:00.18 | 0 | 0 | | | | 0 | +++++ | 10 |
| 10 | 9 | 17 | NESTED LOOPS | | 1 | 10 | 1000K|00:00:04.24 | 19057 | 17970 | | | | 00:00:00.37 | 0 | 0 | | | | 0 | +++++ | 10 |
| 11 | 10 | 15 | BITMAP CONVERSION TO ROWIDS| | 1 | 10 | 1000K|00:00:00.41 | 2107 | 1020 | | | | 00:00:00.11 | 0 | 0 | | | | 0 | +++++ | 10 |
| 12 | 11 | 14 | BITMAP AND | | 1 | | 11 |00:00:00.30 | 2107 | 1020 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 13 | 12 | 5 | BITMAP MERGE | | 1 | | 11 |00:00:00.12 | 863 | 400 | 1024K| 512K| 2804K (0)| 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 14 | 13 | 4 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.10 | 863 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 15 | 14 | 2 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 16 | 14 | 3 | BITMAP INDEX RANGE SCAN| T_FK1 | 100 | | 800 |00:00:00.10 | 699 | 400 | | | | 00:00:00.10 | 699 | 400 | | | | 0 | | |
| 17 | 12 | 9 | BITMAP MERGE | | 1 | | 11 |00:00:00.12 | 847 | 400 | 2802K| 512K| 2804K (0)| 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 18 | 17 | 8 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.10 | 847 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 19 | 18 | 6 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 20 | 18 | 7 | BITMAP INDEX RANGE SCAN| T_FK3 | 100 | | 800 |00:00:00.10 | 683 | 400 | | | | 00:00:00.10 | 683 | 400 | | | | 0 | | |
| 21 | 12 | 13 | BITMAP MERGE | | 1 | | 11 |00:00:00.06 | 397 | 220 | 1024K| 512K| 1581K (0)| 00:00:00.01 | 0 | 0 | | | | 0 | | |
| 22 | 21 | 12 | BITMAP KEY ITERATION | | 1 | | 440 |00:00:00.05 | 397 | 220 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 23 | 22 | 10 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
|* 24 | 22 | 11 | BITMAP INDEX RANGE SCAN| T_FK2 | 10 | | 440 |00:00:00.05 | 233 | 220 | | | | 00:00:00.05 | 233 | 220 | | | | 0 | | |
| 25 | 10 | 16 | TABLE ACCESS BY USER ROWID | T | 1000K| 1 | 1000K|00:00:03.46 | 16950 | 16950 | | | | 00:00:03.46 | 16950 | 16950 | @@ | | @@@@@ | 0 | | 1000K|
|* 26 | 6 | 20 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 27 | 5 | 22 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 28 | 4 | 24 | INDEX RANGE SCAN | T1_IDX | 1000K| 1 | 1000K|00:00:04.57 | 2111K| 3180 | | | | 00:00:04.57 | 2112K| 3180 | @@@ | @@@@@@@@@@@@| @ | 2 | | 1000K|
| 29 | 3 | 26 | TABLE ACCESS BY INDEX ROWID | T1 | 1000K| 1 | 1000K|00:00:09.25 | 29628 | 22648 | | | | 00:00:09.25 | 29628 | 22648 | @@@@@ | | @@@@@@ | 0 | | 1000K|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM>1)
5 - access("ITEM_1"="D"."ID")
6 - access("ITEM_3"="D"."ID")
7 - access("ITEM_2"="D"."ID")
8 - filter("IS_FLAG_D2"='Y')
15 - filter("IS_FLAG_D1"='Y')
16 - access("F"."FK1"="D"."ID")
19 - filter("IS_FLAG_D3"='Y')
20 - access("F"."FK3"="D"."ID")
23 - filter("IS_FLAG_D2"='Y')
24 - access("F"."FK2"="D"."ID")
26 - filter("IS_FLAG_D1"='Y')
27 - filter("IS_FLAG_D3"='Y')
28 - access("T1"."ID"="ITEM_4")

Note
-----
- star transformation used for this statement


68 rows selected.


This is another case where a more efficient execution plan could be found if the cardinality estimate was in the right ballpark - you can see this pretty clearly in the "TCF Graph" column. Due to the strong underestimation several bad choices have been made: Reading all rows from T by ROWID rather than performing simply a full table scan and again an index driven random access to T1 which drives up the logical I/O unnecessarily. This is a crafted example that minimizes the logical and physical I/O due to the good clustering of T1 in relation to the data returned by the driving row source - a more real-life bad clustering together with larger table sizes would have turned this into a more or less infinitely running query.

It is also an example that simply looking at E-Rows and A-Rows can be misleading: Check operations 28 and 29: A-Rows is 1000K but E-Rows is 1, so should this be worrying? Not at all if you look at the "E-Rows*Sta" column because the operation has been started 1000K times hence the estimate is spot on.

The "LIO Ratio" for operation 23 is 16 - this means it took 16 LIOs on average to generate a single row and might indicate that there are more efficient ways to generate those rows than a full table scan.

By the way, the 11g buffer pinning optimization also helped to minimize the logical I/O on the T1 table.

Here is the same query, but this time with a bad clustering of T1 - I've cancelled it after 40 seconds to give you an example of that you can use DBMS_XPLAN.DISPLAY_CURSOR without the need to run a statement for completion.


SQL> select * from (
2 select t1.id as t1_id, t1.filler, s.id as s_id from t1, (
3 select
4 f.id
5 from
6 t f
7 , (select * from d where is_flag_d1 = 'Y') d1
8 , (select * from d where is_flag_d2 = 'Y') d2
9 , (select * from d where is_flag_d3 = 'Y') d3
10 where
11 f.fk1 = d1.id
12 and f.fk2 = d2.id
13 and f.fk3 = d3.id
14 ) s
15 where t1.id = s.id
16 )
17 where rownum > 1
18 ;
select t1.id as t1_id, t1.filler, s.id as s_id from t1, (
*
ERROR at line 2:
ORA-01013: user requested cancel of current operation


Elapsed: 00:00:40.71
SQL> @xplan_extended_display_cursor
SQL_ID 5u3x96k4s5zt6, child number 0
-------------------------------------
select * from ( select t1.id as t1_id, t1.filler, s.id as s_id from t1,
( select f.id from t f , (select * from d where
is_flag_d1 = 'Y') d1 , (select * from d where is_flag_d2 = 'Y') d2
, (select * from d where is_flag_d3 = 'Y') d3 where f.fk1 =
d1.id and f.fk2 = d2.id and f.fk3 = d3.id ) s where t1.id = s.id
) where rownum > 1

Plan hash value: 42027304

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 30 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 0 | 0 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
| 1 | 0 | 29 | COUNT | | 1 | | 0 |00:00:00.01 | 0 | 0 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
|* 2 | 1 | 28 | FILTER | | 1 | | 0 |00:00:00.01 | 0 | 0 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
| 3 | 2 | 27 | NESTED LOOPS | | 1 | | 102K|00:00:40.42 | 337K| 143K| | | | 00:00:00.16 | 0 | 0 | | | | 0 | | |
| 4 | 3 | 25 | NESTED LOOPS | | 1 | 9 | 102K|00:00:13.08 | 235K| 40794 | | | | 00:00:00.13 | 0 | 0 | | | | 0 | ++++ | 9 |
|* 5 | 4 | 23 | HASH JOIN | | 1 | 9 | 102K|00:00:05.70 | 19420 | 17970 | 33M| 6589K| 65M (0)| 00:00:00.40 | 0 | 0 | | | | 0 | ++++ | 9 |
|* 6 | 5 | 21 | HASH JOIN | | 1 | 9 | 1000K|00:00:05.30 | 19385 | 17970 | 37M| 6044K| 69M (0)| 00:00:00.57 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 7 | 6 | 19 | HASH JOIN | | 1 | 10 | 1000K|00:00:04.73 | 19221 | 17970 | 1452K| 1452K| 1010K (0)| 00:00:00.52 | 0 | 0 | | | | 0 | +++++ | 10 |
|* 8 | 7 | 1 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
| 9 | 7 | 18 | VIEW | VW_ST_84A34AF1 | 1 | 10 | 1000K|00:00:04.21 | 19057 | 17970 | | | | 00:00:00.18 | 0 | 0 | | | | 0 | +++++ | 10 |
| 10 | 9 | 17 | NESTED LOOPS | | 1 | 10 | 1000K|00:00:04.03 | 19057 | 17970 | | | | 00:00:00.37 | 0 | 0 | | | | 0 | +++++ | 10 |
| 11 | 10 | 15 | BITMAP CONVERSION TO ROWIDS| | 1 | 10 | 1000K|00:00:00.42 | 2107 | 1020 | | | | 00:00:00.10 | 0 | 0 | | | | 0 | +++++ | 10 |
| 12 | 11 | 14 | BITMAP AND | | 1 | | 11 |00:00:00.32 | 2107 | 1020 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 13 | 12 | 5 | BITMAP MERGE | | 1 | | 11 |00:00:00.13 | 863 | 400 | 1024K| 512K| | 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 14 | 13 | 4 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.11 | 863 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 15 | 14 | 2 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 16 | 14 | 3 | BITMAP INDEX RANGE SCAN| T_FK1 | 100 | | 800 |00:00:00.11 | 699 | 400 | | | | 00:00:00.11 | 699 | 400 | | | | 0 | | |
| 17 | 12 | 9 | BITMAP MERGE | | 1 | | 11 |00:00:00.13 | 847 | 400 | 2802K| 512K| 2804K (0)| 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 18 | 17 | 8 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.11 | 847 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 19 | 18 | 6 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 20 | 18 | 7 | BITMAP INDEX RANGE SCAN| T_FK3 | 100 | | 800 |00:00:00.11 | 683 | 400 | | | | 00:00:00.11 | 683 | 400 | | | | 0 | | |
| 21 | 12 | 13 | BITMAP MERGE | | 1 | | 11 |00:00:00.06 | 397 | 220 | 1024K| 512K| | 00:00:00.01 | 0 | 0 | | | | 0 | | |
| 22 | 21 | 12 | BITMAP KEY ITERATION | | 1 | | 440 |00:00:00.05 | 397 | 220 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 23 | 22 | 10 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
|* 24 | 22 | 11 | BITMAP INDEX RANGE SCAN| T_FK2 | 10 | | 440 |00:00:00.05 | 233 | 220 | | | | 00:00:00.05 | 233 | 220 | | | | 0 | | |
| 25 | 10 | 16 | TABLE ACCESS BY USER ROWID | T | 1000K| 1 | 1000K|00:00:03.24 | 16950 | 16950 | | | | 00:00:03.24 | 16950 | 16950 | @ | @ | @ | 0 | | 1000K|
|* 26 | 6 | 20 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 27 | 5 | 22 | TABLE ACCESS FULL | D | 1 | 100 | 11 |00:00:00.01 | 35 | 0 | | | | 00:00:00.00 | 35 | 0 | | | | 3 | | 100 |
|* 28 | 4 | 24 | INDEX RANGE SCAN | T1_IDX | 102K| 1 | 102K|00:00:07.26 | 216K| 22824 | | | | 00:00:07.26 | 216K| 22824 | @@ | @@@@@@@@ | @@ | 2 | | 102K|
| 29 | 3 | 26 | TABLE ACCESS BY INDEX ROWID | T1 | 102K| 1 | 102K|00:00:27.19 | 102K| 102K| | | | 00:00:27.19 | 102K| 102K| @@@@@@@@ | @@@@ | @@@@@@@@@ | 1 | | 102K|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM>1)
5 - access("ITEM_1"="D"."ID")
6 - access("ITEM_3"="D"."ID")
7 - access("ITEM_2"="D"."ID")
8 - filter("IS_FLAG_D2"='Y')
15 - filter("IS_FLAG_D1"='Y')
16 - access("F"."FK1"="D"."ID")
19 - filter("IS_FLAG_D3"='Y')
20 - access("F"."FK3"="D"."ID")
23 - filter("IS_FLAG_D2"='Y')
24 - access("F"."FK2"="D"."ID")
26 - filter("IS_FLAG_D1"='Y')
27 - filter("IS_FLAG_D3"='Y')
28 - access("T1"."ID"="ITEM_4")

Note
-----
- star transformation used for this statement


68 rows selected.


The result is similar to the previous, but you can see that the increased number of physical reads on the T1 table segment slowed down the execution significantly.

Here is an example of Parallel Execution. Note that I strongly recommend the Real-Time SQL Monitoring feature if you have to deal a lot with Parallel Execution, because it is offering much more insight and information than DBMS_XPLAN.DISPLAY_CURSOR.


SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> select /*+ parallel(t1, 4) */ count(*) from t1;

COUNT(*)
----------
1000000

Elapsed: 00:00:01.59
SQL>
SQL> @xplan_extended_display_cursor "" "" ALLSTATS
SQL> set echo off verify off termout off
SQL_ID 92661sht5tyw1, child number 0
-------------------------------------
select /*+ parallel(t1, 4) */ count(*) from t1

Plan hash value: 3110199320

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 7 | SELECT STATEMENT | | 1 | | 1 |00:00:01.53 | 7 | 2 | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 1 | 0 | 6 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.53 | 7 | 2 | 00:00:00.00 | 0 | 0 | | | | 0 | | 1 |
| 2 | 1 | 5 | PX COORDINATOR | | 1 | | 4 |00:00:01.53 | 7 | 2 | 00:00:01.53 | 7 | 2 | @@@ | | | 1 | | |
| 3 | 2 | 4 | PX SEND QC (RANDOM) | :TQ10000 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | 00:00:00.00 | 0 | 0 | | | | | | 0 |
| 4 | 3 | 3 | SORT AGGREGATE | | 4 | 1 | 4 |00:00:05.84 | 15541 | 15385 | 00:00:00.14 | 0 | 3841 | | | @@@ | 0 | | 4 |
| 5 | 4 | 2 | PX BLOCK ITERATOR | | 4 | 1000K| 1000K|00:00:05.71 | 15541 | 11544 | 00:00:00.27 | 0 | 0 | @ | | | 0 | | 4000K|
|* 6 | 5 | 1 | TABLE ACCESS FULL| T1 | 52 | 1000K| 1000K|00:00:05.44 | 15541 | 15385 | 00:00:05.44 | 15541 | 15385 | @@@@@@@@@@@ | @@@@@@@@@@@@| @@@@@@@@@@@@| 0 | - | 52M|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access(:Z>=:Z AND :Z<=:Z)


23 rows selected.


The important points to consider when dealing with Parallel Execution are:

- You need to use ALLSTATS instead of ALLSTATS LAST in order to get a meaningful output. ALLSTATS LAST would only show the activity of the Query Coordinator. However ALLSTATS means that it shows the statistics cumulative for all executions so far, so if the query has been executed multiple times this will show you not only the statistics of the last execution. If you want to ensure to see only the statistics for the last execution you need to create a new cursor, for example by adding a simple comment that makes the SQL statement text unique

- You can see in the output that the "TCF Graph" and "E-Rows*Sta" columns can be misleading for Parallel Execution - the full table scan has been divided into 52 chunks executed by four parallel slaves, hence the Starts column shows 52 but the cardinality estimate of 1000K rows was spot on instead of wrong by factor 52

- The elapsed time information for the parts executed in parallel is not the wall clock time but the accumulated time spent by all parallel slaves, hence the Graphs will be partially misleading due to the MAX value found

- In this case also the "Reads Self" column seems to indicate reads by the SORT AGGREGATE operation - this looks questionable, too

The last example shows that it takes just a simple scalar subquery to make the output misleading again - so be aware that are some exceptions (like scalar / early filter subqueries, certain Parallel Execution plans etc.) to the rules how to interpret execution plans and usually any automated interpretation of such plans is therefore mislead:


SQL> select count(id) from (select (select id from t1 t1_i where t1_i.id = t1.id) as id from t1);

COUNT(ID)
----------
1000000

Elapsed: 00:00:17.50
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID af2gry2z9g7vt, child number 0
-------------------------------------
select count(id) from (select (select id from t1 t1_i where t1_i.id =
t1.id) as id from t1)

Plan hash value: 1144741071

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 4 | SELECT STATEMENT | | 1 | | 1 |00:00:17.51 | 1551K| 17617 | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 1 | 0 | 1 | INDEX RANGE SCAN | T1_IDX | 1000K| 1 | 1000K|00:00:09.36 | 1536K| 2231 | 00:00:09.36 | 1536K| 2231 | @@@@@@ | @@@@@@@@@@@@| @@ | 1 | | 1000K|
| 2 | 0 | 3 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.51 | 1551K| 17617 | 00:00:15.76 | 1536K| 2231 | @@@@@@@@@@@ | @@@@@@@@@@@@| @@ | 1536485 | | 1 |
| 3 | 2 | 2 | TABLE ACCESS FULL| T1 | 1 | 1000K| 1000K|00:00:01.74 | 15390 | 15386 | 00:00:01.74 | 15390 | 15386 | @ | | @@@@@@@@@@ | 0 | | 1000K|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T1_I"."ID"=:B1)


The scalar subquery is shown as child operation to the root node (or in 10g with the missing ID = 0 operation in V$SQL_PLAN_STATISTICS(_ALL) as independent operation with no parent at all) and according to the usual rules therefore would be executed first (see the "Ord" column), but this is not true - the execution starts with the first leaf of the main branch of the plan (the full table scan of T1).

Note that not only the "Ord" column is wrong, also the fact that the SORT AGGREGATE operation includes the work performed by the scalar subquery is not interpreted correctly by the remaining logic calculating the operation self statistics.

The Script

Below you can find the current version of the script. If you're too lazy to copy&paste (and because I don't have a fancy "copy to clipboard" button) you can also download the script from here.

Of course I'm interested in feedback. This prototype is not tested much yet, so expect glitches and problems. If you get back to me with reproducible cases I'll try to address them and publish updated versions of the script.

A final note: This tool comes for free but with no warranties at all. Use at your own risk.

Happy rowsource profiling (and holiday season)!


set echo off verify off termout off
set doc off
doc
-- ----------------------------------------------------------------------------------------------
--
-- Script: xplan_extended_display_cursor.sql
--
-- Version: 0.9
-- December 2011
--
-- Author: Randolf Geist
-- oracle-randolf.blogspot.com
--
-- Description: A free-standing SQL wrapper over DBMS_XPLAN. Provides access to the
-- DBMS_XPLAN.DISPLAY_CURSOR pipelined function for a given SQL_ID and CHILD_NUMBER
--
-- This is a prototype for an extended analysis of the data provided by the
-- Runtime Profile (aka. Rowsource Statistics enabled via
-- SQL_TRACE = TRUE, STATISTICS_LEVEL = ALL or GATHER_PLAN_STATISTICS hint)
-- and reported via the ALLSTATS/MEMSTATS/IOSTATS formatting option of
-- DBMS_XPLAN.DISPLAY_CURSOR
--
-- Versions: This utility will work for all versions of 10g and upwards.
--
-- Required: The same access as DBMS_XPLAN.DISPLAY_CURSOR requires. See the documentation
-- of DISPLAY_CURSOR for your Oracle version for more information
--
-- The script directly queries
-- 1) V$SESSION
-- 2) V$SQL_PLAN_STATISTICS_ALL
--
-- Credits: Based on the original XPLAN implementation by Adrian Billington (http://www.oracle-developer.net/utilities.php
-- resp. http://www.oracle-developer.net/content/utilities/xplan.zip)
-- and inspired by Kyle Hailey's TCF query (http://dboptimizer.com/2011/09/20/display_cursor/)
--
-- Features: In addition to the PID (The PARENT_ID) and ORD (The order of execution, note that this doesn't account for the special cases so it might be wrong)
-- columns added by Adrian's wrapper the following additional columns over ALLSTATS are provided:
--
-- A_TIME_SELF : The time taken by the operation itself - this is the operation's cumulative time minus the direct descendants operation's cumulative time
-- LIO_SELF : The LIOs done by the operation itself - this is the operation's cumulative LIOs minus the direct descendants operation's cumulative LIOs
-- READS_SELF : The reads performed the operation itself - this is the operation's cumulative reads minus the direct descendants operation's cumulative reads
-- WRITES_SELF : The writes performed the operation itself - this is the operation's cumulative writes minus the direct descendants operation's cumulative writes
-- A_TIME_SELF_GRAPH : A graphical representation of A_TIME_SELF relative to the total A_TIME
-- LIO_SELF_GRAPH : A graphical representation of LIO_SELF relative to the total LIO
-- READS_SELF_GRAPH : A graphical representation of READS_SELF relative to the total READS
-- WRITES_SELF_GRAPH : A graphical representation of WRITES_SELF relative to the total WRITES
-- LIO_RATIO : Ratio of LIOs per row generated by the row source - the higher this ratio the more likely there could be a more efficient way to generate those rows (be aware of aggregation steps though)
-- TCF_GRAPH : Each "+"/"-" sign represents one order of magnitude based on ratio between E_ROWS_TIMES_START and A-ROWS. Note that this will be misleading with Parallel Execution (see E_ROWS_TIMES_START)
-- E_ROWS_TIMES_START : The E_ROWS multiplied by STARTS - this is useful for understanding the actual cardinality estimate for related combine child operations getting executed multiple times. Note that this will be misleading with Parallel Execution
--
-- More information including demos can be found online at http://oracle-randolf.blogspot.com/2011/12/extended-displaycursor-with-rowsource.html
--
-- Usage: @xplan_extended_display_cursor.sql [sql_id] [cursor_child_number] [format_option]
--
-- If both the SQL_ID and CHILD_NUMBER are omitted the previously executed SQL_ID and CHILD_NUMBER of the session will be used
-- If the SQL_ID is specified but the CHILD_NUMBER is omitted then CHILD_NUMBER 0 is assumed
--
-- This prototype does not support processing multiple child cursors like DISPLAY_CURSOR is capable of
-- when passing NULL as CHILD_NUMBER to DISPLAY_CURSOR. Hence a CHILD_NUMBER is mandatory, either
-- implicitly generated (see above) or explicitly passed
--
-- The default formatting option for the call to DBMS_XPLAN.DISPLAY_CURSOR is ALLSTATS LAST - extending this output is the primary purpose of this script
--
-- Note: You need a veeery wide terminal setting for this prototype, something like linesize 400 should suffice
--
-- This tool is free but comes with no warranty at all - use at your own risk
--
#

col plan_table_output format a400
set linesize 400 pagesize 0 tab off

/* ALLSTATS LAST is assumed as the default formatting option for DBMS_XPLAN.DISPLAY_CURSOR */
define default_fo = "ALLSTATS LAST"

column prev_sql_id new_value prev_sql_id
column prev_child_number new_value prev_cn

/* Get the previous command as default
if no SQL_ID / CHILD_NUMBER is passed */
select
prev_sql_id
, prev_child_number
from
v$session
where
sid = userenv('sid')
;

-- The following is a hack to use default
-- values for defines
column 1 new_value 1
column 2 new_value 2
column 3 new_value 3

select
'' as "1"
, '' as "2"
, '' as "3"
from
dual
where
rownum = 0;

column si new_value si
column cn new_value cn
column fo new_value fo

/* Use passed parameters else refer to previous SQL_ID / CHILD_NUMBER
ALLSTATS LAST is default formatting option */
select
nvl('&1', '&prev_sql_id') as si
, coalesce('&2', '&prev_cn', '0') as cn
, nvl('&3', '&default_fo') as fo
from
dual
;

column last new_value last

/* Last or all execution */
select
case
when instr('&fo', 'LAST') > 0
then 'last_'
end as last
from
dual
;

set termout on

with
-- The next three queries are based on the original XPLAN wrapper by Adrian Billington
-- to determine the PID and ORD information, only slightly modified to deal with
-- the 10g special case that V$SQL_PLAN_STATISTICS_ALL doesn't include the ID = 0 operation
-- and starts with 1 instead for Rowsource Statistics
sql_plan_data as
(
select
id
, parent_id
from
v$sql_plan_statistics_all
where
sql_id = '&si'
and child_number = &cn
),
hierarchy_data as
(
select
id
, parent_id
from
sql_plan_data
start with
id in
(
select
id
from
sql_plan_data p1
where
not exists
(
select
null
from
sql_plan_data p2
where
p2.id = p1.parent_id
)
)
connect by
prior id = parent_id
order siblings by
id desc
),
ordered_hierarchy_data as
(
select
id
, parent_id as pid
, row_number() over (order by rownum desc) as oid
, max(id) over () as maxid
, min(id) over () as minid
from
hierarchy_data
),
-- The following query uses the MAX values
-- rather than taking the values of PLAN OPERATION_ID = 0 (or 1 for 10g V$SQL_PLAN_STATISTICS_ALL)
-- for determining the grand totals
--
-- This is because queries that get cancelled do not
-- necessarily have yet sensible values in the root plan operation
--
-- Furthermore with Parallel Execution the elapsed time accumulated
-- with the ALLSTATS option for operations performed in parallel
-- will be greater than the wallclock elapsed time shown for the Query Coordinator
--
-- Note that if you use GATHER_PLAN_STATISTICS with the default
-- row sampling frequency the (LAST_)ELAPSED_TIME will be very likely
-- wrong and hence the time-based graphs and self-statistics will be misleading
--
-- Similar things might happen when cancelling queries
--
-- For queries running with STATISTICS_LEVEL = ALL (or sample frequency set to 1)
-- the A-TIME is pretty reliable
totals as
(
select
max(&last.cu_buffer_gets + &last.cr_buffer_gets) as total_lio
, max(&last.elapsed_time) as total_elapsed
, max(&last.disk_reads) as total_reads
, max(&last.disk_writes) as total_writes
from
v$sql_plan_statistics_all
where
sql_id = '&si'
and child_number = &cn
),
-- The totals for the direct descendants of an operation
-- These are required for calculating the work performed
-- by a (parent) operation itself
-- Basically this is the SUM grouped by PARENT_ID
direct_desc_totals as
(
select
sum(&last.cu_buffer_gets + &last.cr_buffer_gets) as lio
, sum(&last.elapsed_time) as elapsed
, sum(&last.disk_reads) as reads
, sum(&last.disk_writes) as writes
, parent_id
from
v$sql_plan_statistics_all
where
sql_id = '&si'
and child_number = &cn
group by
parent_id
),
-- Putting the three together
-- The statistics, direct descendant totals plus totals
extended_stats as
(
select
stats.id
, stats.parent_id
, stats.&last.elapsed_time as elapsed
, (stats.&last.cu_buffer_gets + stats.&last.cr_buffer_gets) as lio
, stats.&last.starts as starts
, stats.&last.output_rows as a_rows
, stats.cardinality as e_rows
, stats.&last.disk_reads as reads
, stats.&last.disk_writes as writes
, ddt.elapsed as ddt_elapsed
, ddt.lio as ddt_lio
, ddt.reads as ddt_reads
, ddt.writes as ddt_writes
, t.total_elapsed
, t.total_lio
, t.total_reads
, t.total_writes
from
v$sql_plan_statistics_all stats
, direct_desc_totals ddt
, totals t
where
stats.sql_id='&si'
and stats.child_number = &cn
and ddt.parent_id (+) = stats.id
),
-- Further information derived from above
derived_stats as
(
select
id
, greatest(elapsed - nvl(ddt_elapsed , 0), 0) as elapsed_self
, greatest(lio - nvl(ddt_lio, 0), 0) as lio_self
, trunc((greatest(lio - nvl(ddt_lio, 0), 0)) / nullif(a_rows, 0)) as lio_ratio
, greatest(reads - nvl(ddt_reads, 0), 0) as reads_self
, greatest(writes - nvl(ddt_writes,0) ,0) as writes_self
, total_elapsed
, total_lio
, total_reads
, total_writes
, trunc(log(10, nullif(starts * e_rows / nullif(a_rows, 0), 0))) as tcf_ratio
, starts * e_rows as e_rows_times_start
from
extended_stats
),
/* Format the data as required */
formatted_data1 as
(
select
id
, lio_ratio
, total_elapsed
, total_lio
, total_reads
, total_writes
, to_char(numtodsinterval(round(elapsed_self / 10000) * 10000 / 1000000, 'SECOND')) as e_time_interval
/* Imitate the DBMS_XPLAN number formatting */
, case
when lio_self >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when lio_self >= 10000000000000000000 then to_char(lio_self/1000000000000000000, 'FM99999') || 'E'
when lio_self >= 10000000000000000 then to_char(lio_self/1000000000000000, 'FM99999') || 'P'
when lio_self >= 10000000000000 then to_char(lio_self/1000000000000, 'FM99999') || 'T'
when lio_self >= 10000000000 then to_char(lio_self/1000000000, 'FM99999') || 'G'
when lio_self >= 10000000 then to_char(lio_self/1000000, 'FM99999') || 'M'
when lio_self >= 100000 then to_char(lio_self/1000, 'FM99999') || 'K'
else to_char(lio_self, 'FM99999') || ' '
end as lio_self_format
, case
when reads_self >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when reads_self >= 10000000000000000000 then to_char(reads_self/1000000000000000000, 'FM99999') || 'E'
when reads_self >= 10000000000000000 then to_char(reads_self/1000000000000000, 'FM99999') || 'P'
when reads_self >= 10000000000000 then to_char(reads_self/1000000000000, 'FM99999') || 'T'
when reads_self >= 10000000000 then to_char(reads_self/1000000000, 'FM99999') || 'G'
when reads_self >= 10000000 then to_char(reads_self/1000000, 'FM99999') || 'M'
when reads_self >= 100000 then to_char(reads_self/1000, 'FM99999') || 'K'
else to_char(reads_self, 'FM99999') || ' '
end as reads_self_format
, case
when writes_self >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when writes_self >= 10000000000000000000 then to_char(writes_self/1000000000000000000, 'FM99999') || 'E'
when writes_self >= 10000000000000000 then to_char(writes_self/1000000000000000, 'FM99999') || 'P'
when writes_self >= 10000000000000 then to_char(writes_self/1000000000000, 'FM99999') || 'T'
when writes_self >= 10000000000 then to_char(writes_self/1000000000, 'FM99999') || 'G'
when writes_self >= 10000000 then to_char(writes_self/1000000, 'FM99999') || 'M'
when writes_self >= 100000 then to_char(writes_self/1000, 'FM99999') || 'K'
else to_char(writes_self, 'FM99999') || ' '
end as writes_self_format
, case
when e_rows_times_start >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when e_rows_times_start >= 10000000000000000000 then to_char(e_rows_times_start/1000000000000000000, 'FM99999') || 'E'
when e_rows_times_start >= 10000000000000000 then to_char(e_rows_times_start/1000000000000000, 'FM99999') || 'P'
when e_rows_times_start >= 10000000000000 then to_char(e_rows_times_start/1000000000000, 'FM99999') || 'T'
when e_rows_times_start >= 10000000000 then to_char(e_rows_times_start/1000000000, 'FM99999') || 'G'
when e_rows_times_start >= 10000000 then to_char(e_rows_times_start/1000000, 'FM99999') || 'M'
when e_rows_times_start >= 100000 then to_char(e_rows_times_start/1000, 'FM99999') || 'K'
else to_char(e_rows_times_start, 'FM99999') || ' '
end as e_rows_times_start_format
, rpad(' ', nvl(round(elapsed_self / nullif(total_elapsed, 0) * 12), 0) + 1, '@') as elapsed_self_graph
, rpad(' ', nvl(round(lio_self / nullif(total_lio, 0) * 12), 0) + 1, '@') as lio_self_graph
, rpad(' ', nvl(round(reads_self / nullif(total_reads, 0) * 12), 0) + 1, '@') as reads_self_graph
, rpad(' ', nvl(round(writes_self / nullif(total_writes, 0) * 12), 0) + 1, '@') as writes_self_graph
, ' ' ||
case
when tcf_ratio > 0
then rpad('-', tcf_ratio, '-')
else rpad('+', tcf_ratio * -1, '+')
end as tcf_graph
from
derived_stats
),
/* The final formatted data */
formatted_data as
(
select
/*+ Convert the INTERVAL representation to the A-TIME representation used by DBMS_XPLAN
by turning the days into hours */
to_char(to_number(substr(e_time_interval, 2, 9)) * 24 + to_number(substr(e_time_interval, 12, 2)), 'FM900') ||
substr(e_time_interval, 14, 9)
as a_time_self
, a.*
from
formatted_data1 a
),
/* Combine the information with the original DBMS_XPLAN output */
xplan_data as (
select
x.plan_table_output
, o.id
, o.pid
, o.oid
, o.maxid
, o.minid
, a.a_time_self
, a.lio_self_format
, a.reads_self_format
, a.writes_self_format
, a.elapsed_self_graph
, a.lio_self_graph
, a.reads_self_graph
, a.writes_self_graph
, a.lio_ratio
, a.tcf_graph
, a.total_elapsed
, a.total_lio
, a.total_reads
, a.total_writes
, a.e_rows_times_start_format
, x.rn
from
(
select /* Take advantage of 11g table function dynamic sampling */
/*+ dynamic_sampling(dc, 2) */
/* This ROWNUM determines the order of the output/processing */
rownum as rn
, plan_table_output
from
table(dbms_xplan.display_cursor('&si',&cn, '&fo')) dc
) x
, ordered_hierarchy_data o
, formatted_data a
where
o.id (+) = case
when regexp_like(x.plan_table_output, '^\|[\* 0-9]+\|')
then to_number(regexp_substr(x.plan_table_output, '[0-9]+'))
end
and a.id (+) = case
when regexp_like(x.plan_table_output, '^\|[\* 0-9]+\|')
then to_number(regexp_substr(x.plan_table_output, '[0-9]+'))
end
)
/* Inject the additional data into the original DBMS_XPLAN output
by using the MODEL clause */
select
plan_table_output
from
xplan_data
model
dimension by (rn as r)
measures
(
cast(plan_table_output as varchar2(4000)) as plan_table_output
, id
, maxid
, minid
, pid
, oid
, a_time_self
, lio_self_format
, reads_self_format
, writes_self_format
, e_rows_times_start_format
, elapsed_self_graph
, lio_self_graph
, reads_self_graph
, writes_self_graph
, lio_ratio
, tcf_graph
, total_elapsed
, total_lio
, total_reads
, total_writes
, greatest(max(length(maxid)) over () + 3, 6) as csize
, cast(null as varchar2(128)) as inject
, cast(null as varchar2(4000)) as inject2
)
rules sequential order
(
/* Prepare the injection of the OID / PID info */
inject[r] = case
/* MINID/MAXID are the same for all rows
so it doesn't really matter
which offset we refer to */
when id[cv(r)+1] = minid[cv(r)+1]
or id[cv(r)+3] = minid[cv(r)+3]
or id[cv(r)-1] = maxid[cv(r)-1]
then rpad('-', csize[cv()]*2, '-')
when id[cv(r)+2] = minid[cv(r)+2]
then '|' || lpad('Pid |', csize[cv()]) || lpad('Ord |', csize[cv()])
when id[cv()] is not null
then '|' || lpad(pid[cv()] || ' |', csize[cv()]) || lpad(oid[cv()] || ' |', csize[cv()])
end
/* Prepare the injection of the remaining info */
, inject2[r] = case
when id[cv(r)+1] = minid[cv(r)+1]
or id[cv(r)+3] = minid[cv(r)+3]
or id[cv(r)-1] = maxid[cv(r)-1]
then rpad('-',
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
14 else 0 end /* A_TIME_SELF */ +
case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) > 0 then
11 else 0 end /* LIO_SELF */ +
case when coalesce(total_reads[cv(r)+1], total_reads[cv(r)+3], total_reads[cv(r)-1]) > 0 then
11 else 0 end /* READS_SELF */ +
case when coalesce(total_writes[cv(r)+1], total_writes[cv(r)+3], total_writes[cv(r)-1]) > 0 then
11 else 0 end /* WRITES_SELF */ +
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
14 else 0 end /* A_TIME_SELF_GRAPH */ +
case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) > 0 then
14 else 0 end /* LIO_SELF_GRAPH */ +
case when coalesce(total_reads[cv(r)+1], total_reads[cv(r)+3], total_reads[cv(r)-1]) > 0 then
14 else 0 end /* READS_SELF_GRAPH */ +
case when coalesce(total_writes[cv(r)+1], total_writes[cv(r)+3], total_writes[cv(r)-1]) > 0 then
14 else 0 end /* WRITES_SELF_GRAPH */ +
case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) > 0 then
11 else 0 end /* LIO_RATIO */ +
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
11 else 0 end /* TCF_GRAPH */ +
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
11 else 0 end /* E_ROWS_TIMES_START */
, '-')
when id[cv(r)+2] = minid[cv(r)+2]
then case when total_elapsed[cv(r)+2] > 0 then
lpad('A-Time Self |' , 14) end ||
case when total_lio[cv(r)+2] > 0 then
lpad('Bufs Self |' , 11) end ||
case when total_reads[cv(r)+2] > 0 then
lpad('Reads Self|' , 11) end ||
case when total_writes[cv(r)+2] > 0 then
lpad('Write Self|' , 11) end ||
case when total_elapsed[cv(r)+2] > 0 then
lpad('A-Ti S-Graph |', 14) end ||
case when total_lio[cv(r)+2] > 0 then
lpad('Bufs S-Graph |', 14) end ||
case when total_reads[cv(r)+2] > 0 then
lpad('Reads S-Graph|', 14) end ||
case when total_writes[cv(r)+2] > 0 then
lpad('Write S-Graph|', 14) end ||
case when total_lio[cv(r)+2] > 0 then
lpad('LIO Ratio |' , 11) end ||
case when total_elapsed[cv(r)+2] > 0 then
lpad('TCF Graph |' , 11) end ||
case when total_elapsed[cv(r)+2] > 0 then
lpad('E-Rows*Sta|' , 11) end
when id[cv()] is not null
then case when total_elapsed[cv()] > 0 then
lpad(a_time_self[cv()] || ' |', 14) end ||
case when total_lio[cv()] > 0 then
lpad(lio_self_format[cv()] || '|', 11) end ||
case when total_reads[cv()] > 0 then
lpad(reads_self_format[cv()] || '|', 11) end ||
case when total_writes[cv()] > 0 then
lpad(writes_self_format[cv()] || '|', 11) end ||
case when total_elapsed[cv()] > 0 then
rpad(elapsed_self_graph[cv()], 13) || '|' end ||
case when total_lio[cv()] > 0 then
rpad(lio_self_graph[cv()], 13) || '|' end ||
case when total_reads[cv()] > 0 then
rpad(reads_self_graph[cv()], 13) || '|' end ||
case when total_writes[cv()] > 0 then
rpad(writes_self_graph[cv()], 13) || '|' end ||
case when total_lio[cv()] > 0 then
lpad(lio_ratio[cv()] || ' |', 11) end ||
case when total_elapsed[cv()] > 0 then
rpad(tcf_graph[cv()], 9) || ' |' end ||
case when total_elapsed[cv()] > 0 then
lpad(e_rows_times_start_format[cv()] || '|', 11) end
end
/* Putting it all together */
, plan_table_output[r] = case
when inject[cv()] like '---%'
then inject[cv()] || plan_table_output[cv()] || inject2[cv()]
when inject[cv()] is present
then regexp_replace(plan_table_output[cv()], '\|', inject[cv()], 1, 2) || inject2[cv()]
else plan_table_output[cv()]
end
)
order by
r
;

undefine default_fo
undefine prev_sql_id
undefine prev_cn
undefine last
undefine si
undefine cn
undefine fo
undefine 1
undefine 2
undefine 3

col plan_table_output clear
col prev_sql_id clear
col prev_child_number clear
col si clear
col cn clear
col fo clear
col last clear

Wednesday, December 7, 2011

Table Functions And Join Cardinality Estimates

If you consider the usage of Table Functions then you should be aware of some limitations to the optimizer calculations, in particular when considering a join between a Table Function and other row sources.

As outlined in one of my previous posts you can and should help the optimizer to arrive at a reasonable cardinality estimate when dealing with table functions, however doing so doesn't provide all necessary inputs to the join cardinality calculation that are useful and available from the statistics when dealing with regular tables.

Therefore even when following the recommended practice regarding the cardinality estimates it is possible to end up with some inaccuracies. This post will explain why.

Join Cardinality Basics

In order to appreciate the problem that can be encountered let's have a quick walkthrough what basic information is used by the optimizer to calculate a join cardinality. Here is a very simplified version of the join selectivity formula:

Join Selectivity = 1 / greater(num_distinct(t1.c1), num_distinct(t2.c2))

I've omitted the NULL (and histogram) case and hence simplified the formula further. Furthermore I'll restrict the show case here to a single join column.

There is another information that is evaluated but not obvious from above formula: The low and high values of the join columns. If the join columns do not overlap at all the join cardinality will be calculated as 1.

Finally this join selectivity will be multiplied by the (filtered) cardinality of the two row sources to arrive at the join cardinality:

Join Cardinality = Join Selectivity * cardinality t1 * cardinality t2

So for this simplified basic join cardinality formula the following information is required from the statistics (if available):

- (filtered) num_rows row sources
- num_distinct join columns
- low/high value join columns

Here is an example of this calculation in action, using real tables with table and basic column statistics gathered, hence all of the just mentioned information available:


drop table t1;

purge table t1;

drop table t2;

purge table t2;

create table t1
as
select
rownum as id
-- 10 distinct values 1..10
, mod(rownum, 10) + 1 as fk
, rpad('x', 100) as filler
from
dual
connect by
level <= 10000
;

exec dbms_stats.gather_table_stats(null, 't1')

create table t2
as
select
-- 20 distinct values 1..20
rownum as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 20
;

exec dbms_stats.gather_table_stats(null, 't2')

explain plan for
select
count(*)
from
t1
, t2
where
t1.fk = t2.id
;

set linesize 200 pagesize 0 tab off

select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 10000 |
| 3 | TABLE ACCESS FULL| T2 | 20 |
| 4 | TABLE ACCESS FULL| T1 | 10000 |
--------------------------------------------


So we have:

Join Selectivity = 1 / greater(10, 20) = 1 / 20

Join Cardinality = 1 / 20 * 20 * 10000 = 10000

The join column values do overlap and there is no filter on the two row sources, so the result is as expected.

Now if the simple test case gets modified slightly, for example like this:


drop table t2;

purge table t2;

create table t2
as
select
-- 1 distinct value
1 as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 20
;

exec dbms_stats.gather_table_stats(null, 't2')

explain plan for
select
count(*)
from
t1
, t2
where
t1.fk = t2.id
;

set linesize 200 pagesize 0 tab off

select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 20000 |
| 3 | TABLE ACCESS FULL| T2 | 20 |
| 4 | TABLE ACCESS FULL| T1 | 10000 |
--------------------------------------------


Join Selectivity = 1 / greater(10, 1) = 1 / 10

Join Cardinality = 1 / 10 * 20 * 10000 = 20000

So we can see that the number of distinct values is one of the crucial inputs to the join cardinality calculation (Again: I deliberately keep things simple here and for example omit nulls). Another input are the min and max values of the join columns - this can be seen by again slightly modifying the example:


drop table t2;

purge table t2;

create table t2
as
select
-- 20 distinct values 21..40
rownum + 20 as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 20
;

exec dbms_stats.gather_table_stats(null, 't2')

explain plan for
select
count(*)
from
t1
, t2
where
t1.fk = t2.id
;

set linesize 200 pagesize 0 tab off

select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 1 |
| 3 | TABLE ACCESS FULL| T2 | 20 |
| 4 | TABLE ACCESS FULL| T1 | 10000 |
--------------------------------------------


We can immediately see that the optimizer detected that the join columns do not overlap and hence set the join cardinality to 1.

Now let's move on towards our Table Function case and see what happens if the information is missing from the table statistics and gets amended by dynamic sampling.

First let's start over again with the initial example data set of T2:


drop table t2;

purge table t2;

create table t2
as
select
-- 20 distinct values 1..20
rownum as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 20
;

-- Do not gather statistics but use dynamic sampling instead
alter session set optimizer_dynamic_sampling = 2;

alter session set tracefile_identifier = 'dyn_sample_1';

alter session set events '10053 trace name context forever, level 1';

explain plan for
select
count(*)
from
t1
, t2
where
t1.fk = t2.id
;

set linesize 200 pagesize 0 tab off

select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 10000 |
| 3 | TABLE ACCESS FULL| T2 | 20 |
| 4 | TABLE ACCESS FULL| T1 | 10000 |
--------------------------------------------

Note
-----
- dynamic sampling used for this statement (level=2)


So we can see that dynamic sampling got used and the cardinality estimate is correct for the join. Let's check the CBO trace file what has been executed as dynamic sampling query:


SELECT /* OPT_DYN_SAMP */
/*+
ALL_ROWS
IGNORE_WHERE_CLAUSE
NO_PARALLEL(SAMPLESUB)
opt_param('parallel_execution_enabled', 'false')
NO_PARALLEL_INDEX(SAMPLESUB)
NO_SQL_TUNE
*/
NVL(SUM(C1),0)
, NVL(SUM(C2),0)
, COUNT(DISTINCT C3)
, NVL(SUM(CASE WHEN C3 IS NULL THEN 1 ELSE 0 END),0)
FROM
(
SELECT /*+
NO_PARALLEL("T2")
FULL("T2")
NO_PARALLEL_INDEX("T2")
*/
1 AS C1
, 1 AS C2
, "T2"."ID" AS C3
FROM
"T2" "T2"
) SAMPLESUB
;


So it's interesting to see that the dynamic sampling code detected that the column T2.ID is used as part of a join and therefore not only determined the cardinality of the table but also the num_distinct and num_nulls information of the join column.

Consequently we find this information following in the trace file:


.
.
.
ndv C3 : 20
scaled : 20.00
nulls C4 : 0
scaled : 0.00
min. sel. est. : -1.00000000
** Dynamic sampling col. stats.:
Column (#1): ID( Part#: 0
AvgLen: 22 NDV: 20 Nulls: 0 Density: 0.050000
** Using dynamic sampling NULLs estimates.
** Using dynamic sampling NDV estimates.
Scaled NDVs using cardinality = 20.
** Using dynamic sampling card. : 20
** Dynamic sampling updated table card.
.
.
.


Now if you've followed the description so far carefully you'll notice that there is something missing from the dynamic sampling information that is usually available from the dictionary statistics.

Let's repeat the exercise and use the example data set of T2 where the join columns do not overlap:


drop table t2;

purge table t2;

create table t2
as
select
-- 20 distinct values 21..40
rownum + 20 as id
, rpad('x', 100) as filler
from
dual
connect by
level <= 20
;

-- Do not gather statistics but use dynamic sampling instead
alter session set optimizer_dynamic_sampling = 2;

alter session set tracefile_identifier = 'dyn_sample_2';

alter session set events '10053 trace name context forever, level 1';

explain plan for
select
count(*)
from
t1
, t2
where
t1.fk = t2.id
;

set linesize 200 pagesize 0 tab off

select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));

--------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 10000 |
| 3 | TABLE ACCESS FULL| T2 | 20 |
| 4 | TABLE ACCESS FULL| T1 | 10000 |
--------------------------------------------

Note
-----
- dynamic sampling used for this statement (level=2)


So this is a case where even with dynamic sampling the join cardinality estimate is incorrect and different from what you get with actually gathered statistics. Whether this is a deliberate design decision to keep the footprint of the dynamic sampling query as low as possible or an omission that could be fixed by a bug/enhancement request I don't know but if you were asking yourself whether dynamic sampling is a reasonable replacement for actual statistics or not - here is a case where it doesn't produce the same as basic table and column statistics.

Joins With Table Functions

Now that the basics have been clarified, let's move on to table functions. As outlined in the past and in one of my previous posts you shouldn't use table functions without helping the optimizer to come up with a reasonable cardinality estimate.

But as you have just seen for a proper join cardinality estimation there is more required than a reasonable cardinality estimate of a row source. Let's see what this means when attempting to join table functions with other row sources.

For that purpose I create the following simple table function that allows to generate a simple set of data controlled by the input parameters:


drop type t_num_list;

drop function f_tab_pipelined;

create or replace type t_num_list as table of number;

create or replace function
f_tab_pipelined(
in_start in number default 1
, in_end in number default 10
, in_val in number default null
) return t_num_list pipelined
is
begin
for i in in_start..in_end loop
pipe row(coalesce(in_val, i));
end loop;
return;
end f_tab_pipelined;
/


So I can define the start and end values which will be returned with a step size of one. If the third parameter is provided rather than returning the current loop value the third parameter will be returned, resulting in a single distinct value repeated end - start + 1 times, otherwise the number of distinct values will be equal to the number of rows generated.

From 11.1 on dynamic sampling of table functions is supported, so let's simulate the cases I've just demonstrated with real tables.

First the case with 20 distinct values of T2 and overlapping join column values:


alter session set tracefile_identifier = 'dyn_sample_table_func';

alter session set events '10053 trace name context lifetime 1, level 1';

explain plan for
select /*+ dynamic_sampling(t2, 2) */
count(*)
from
t1
, (
select
column_value as id
from
table(f_tab_pipelined(1, 20, null))
) t2
where
t1.fk = t2.id
;

set linesize 200 pagesize 0 tab off

select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));

-----------------------------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 20000 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| F_TAB_PIPELINED | 20 |
| 4 | TABLE ACCESS FULL | T1 | 10000 |
-----------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement (level=2)


So something is already going wrong here - with the real table and dynamic sampling used we end up with the correct cardinality estimate of 10,000 rows.

Let's have a look at the dynamic sampling query generated:


SELECT
/* OPT_DYN_SAMP */
/*+
ALL_ROWS
IGNORE_WHERE_CLAUSE
NO_PARALLEL(SAMPLESUB)
opt_param('parallel_execution_enabled', 'false')
NO_PARALLEL_INDEX(SAMPLESUB)
NO_SQL_TUNE
*/
NVL(SUM(C1),0)
, NVL(SUM(C2),0)
FROM
(
SELECT
/*+
NO_PARALLEL("KOKBF$")
FULL("KOKBF$")
NO_PARALLEL_INDEX("KOKBF$")
*/
1 AS C1
, 1 AS C2
FROM
TABLE("CBO_TEST"."F_TAB_PIPELINED"(1,20,NULL)) "KOKBF$"
) SAMPLESUB
;


So that's a pity: The dynamic sampling code for table functions at present doesn't recognize the need to gather join column statistics, so the optimizer simply doesn't know about the num_distinct / num_nulls figures of the join column generated by the table function.

It certainly would be nice if Oracle enhanced the dynamic sampling code so that we have at least the same level of information available as with regular tables.

So what does that mean to the Join Cardinality estimate of the optimizer? It looks like that the Join Selectivity formula when dealing with Table Functions could be extended like this:

Join Selectivity = 1 / coalesce(greater(num_distinct(t1.c1), num_distinct(t2.c2)), num_distinct(t1.c1), num_distinct(t2.c2), 100)

So the "greater" function will return NULL if any of the operands are NULL. In this case it seems to use a non-null num_distinct if found, and if none of them are defined, resort to a hard-coded default of 100 resulting in a default selectivity of 1/100.

In our case:

Join Selectivity = 1 / coalesce(greater(10, null), 10, null, 100) = 1 / 10

Join Cardinality = 1 / 10 * 20 * 10000 = 20000

Of course you'll appreciate that the same applies to Table Functions with regards to the non-overlapping join columns - the optimizer doesn't have a clue about these low and high values from the dynamic sampling performed hence it cannot detect such a case.

If you like to see the default case described in above formula in action, it just needs a join of two table functions:


explain plan for
select /*+ dynamic_sampling(t1, 2) dynamic_sampling(t2, 2) */
count(*)
from
(
select
column_value as fk
from
table(f_tab_pipelined(1, 10000, null))
) t1
, (
select
column_value as id
from
table(f_tab_pipelined(1, 20, null))
) t2
where
t1.fk = t2.id
;

set linesize 200 pagesize 0 tab off

select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));

-----------------------------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | HASH JOIN | | 2000 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| F_TAB_PIPELINED | 20 |
| 4 | COLLECTION ITERATOR PICKLER FETCH| F_TAB_PIPELINED | 10000 |
-----------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement (level=2)


So dynamic sampling gets the row source cardinality right, but the join cardinality is way off. If you repeat the same exercise with regular tables and dynamic sampling the correct join cardinality of 20 will be estimated because Oracle detects the 10000 distinct values of T1.FK.

If you really have the need to perform corrective actions with regular tables you can resort to the special hints that Oracle uses in SQL Profiles, which are of course undocumented and therefore you'll have to use them at your own risk. Christian Antognini published a very nice paper about SQL Profiles some time ago that also covers the internal details including the hints introduced for that purpose.

In our case the hint that would allow to provide the missing information would be in particular COLUMN_STATS.

So some variation of the following would be helpful if it worked:


select /*+ dynamic_sampling(t2, 2) column_stats(t2, id, scale, min=1 max=20 nulls=0 distinct=20) */
count(*)
from
t1
, (
select
column_value as id
from
table(f_tab_pipelined(1, 20, null))
) t2
where
t1.fk = t2.id
;


Unfortunately the optimizer obviously refuses to apply those hints to table functions - they are only working with regular tables, so this doesn't help either.

This might also explain why the dynamic sampling code doesn't bother to query that kind of information for Table Functions - may be the optimizer at present cannot process such information and therefore it would be useless to gather it anyway.

Summary

If you would like to use Table Functions and also join them to other row sources you'll have to carefully check the join cardinality estimates of the optimizer, because some crucial information required for a proper join cardinality calculation is not available when dealing with Table Functions. This is also true when using 11g features like Table Function dynamic sampling.