<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5124641802818980374</id><updated>2012-01-27T00:05:24.717+01:00</updated><category term='Remote Procedure Call'/><category term='Slides'/><category term='temporary tablespaces'/><category term='Performance'/><category term='bug'/><category term='NOSORT'/><category term='11.2.0.2'/><category term='offline'/><category term='Exadata'/><category term='events'/><category term='upgrade'/><category term='hard parse'/><category term='SCN'/><category term='extended statistics'/><category term='Cardinality Feedback'/><category term='RLS'/><category term='locked statistics'/><category term='truncate'/><category term='subquery factoring'/><category term='CBQT'/><category term='scheduler'/><category term='Nested Loop Join Batching'/><category term='Scalar Subqueries'/><category term='Tom Kyte'/><category term='CPU Costing'/><category term='11g'/><category term='Expressions'/><category term='Freaky stuff'/><category term='dynamic sampling'/><category term='OTN forum'/><category term='bind variables'/><category term='Insert'/><category term='11.2.0.1'/><category term='AWR'/><category term='Extensible Optimizer'/><category term='pruning'/><category term='FIRST_ROWS_N'/><category term='exchange'/><category term='10.2.0.4'/><category term='Delayed Block Cleanout'/><category term='Scary stuff'/><category term='Selection'/><category term='Optimizer'/><category term='Stored Outlines'/><category term='PERFSTAT'/><category term='memory'/><category term='concurrency'/><category term='ASSM'/><category term='SCN_ASCENDING'/><category term='Compression'/><category term='Execution Plan'/><category term='MOTS'/><category term='Projection'/><category term='Top N Queries'/><category term='Plan stability'/><category term='ACS'/><category term='Parallel DML'/><category term='statistics'/><category term='string literal'/><category term='MODEL'/><category term='star transformation'/><category term='VPD'/><category term='EXPLAIN PLAN'/><category term='HOW TO'/><category term='Hash Aggregation'/><category term='NLS issues'/><category term='Nested Loop Join'/><category term='virtual columns'/><category term='DBMS_XPLAN'/><category term='ASH'/><category term='Hash functions'/><category term='index size estimate'/><category term='blocking'/><category term='Context'/><category term='CTAS'/><category term='Cursor Sharing'/><category term='Data Loading'/><category term='National character set'/><category term='Index'/><category term='ADDM'/><category term='create index'/><category term='scripts'/><category term='10gR2'/><category term='function-based index'/><category term='direct path'/><category term='unique'/><category term='Advert'/><category term='transitive closure'/><category term='Multi-Column Join'/><category term='deferrable'/><category term='Partitioning'/><category term='Unique indexes'/><category term='quiz'/><category term='SYS schema'/><category term='SQL*Plus'/><category term='10.2.0.5'/><category term='HCC'/><category term='Adaptive Cursor Sharing'/><category term='Patch Set'/><category term='Bitmap Index'/><category term='TEMP TABLE transformation'/><category term='Public Appearance'/><category term='Rowsource Profiling'/><category term='I/O Resource Calibration'/><category term='SQL statement analysis'/><category term='Auto-DOP'/><category term='Author'/><category term='Join'/><category term='PL/SQL'/><category term='Constraints'/><category term='Index usage'/><category term='Clustering Factor'/><category term='Logical I/O'/><category term='troubleshooting'/><category term='ORA-14404'/><category term='9.2.0.8'/><category term='ORA-01555'/><category term='Statspack'/><category term='Parallel Execution'/><category term='current_schema'/><category term='Fundamentals'/><category term='performance tuning'/><category term='AUTOTRACE'/><category term='LOB'/><category term='Table Functions'/><category term='undo'/><category term='histograms'/><category term='UGA'/><category term='scalability'/><category term='System Statistics'/><category term='DBMS_JOB'/><category term='unusable indexes'/><category term='Shared Pool'/><category term='XML'/><category term='Time Model'/><category term='undocumented'/><category term='Load As Select'/><category term='limitations'/><category term='rollback'/><category term='User-Defined Functions'/><category term='PGA_AGGREGATE_TARGET'/><category term='External Tables'/><category term='cardinality'/><category term='FGAC'/><category term='LIKE'/><category term='Restrictions'/><category term='OOW'/><category term='New Features'/><category term='read consistency'/><category term='Flashback Query'/><category term='Globalization'/><category term='cleanup'/><category term='Prefetching'/><category term='selectivity'/><category term='list partitioning'/><category term='PGA'/><category term='String aggregation'/><category term='missing statistics'/><category term='AS OF'/><category term='archive'/><category term='download'/><category term='Parallel'/><category term='Conference'/><category term='Cool Stuff'/><category term='Presentation'/><category term='DBMS_STATS'/><category term='Book'/><category term='frequency histograms'/><category term='Bitmap Join Index'/><category term='Unit Testing'/><category term='Subpartition'/><category term='Oracle ACE'/><category term='11.1.0.7'/><category term='PX COORDINATOR FORCED SERIAL'/><category term='primary key'/><category term='ALTER SESSION'/><category term='CBO'/><category term='PLAN_HASH_VALUE'/><category term='ORA-14405'/><category term='write consistency'/><category term='wrong estimates'/><category term='Batched I/O'/><category term='incremental'/><category term='hints'/><category term='WORKAREA_SIZE_POLICY'/><category term='tablespaces groups'/><category term='DDL'/><category term='DB Time'/><category term='block sizes'/><category term='jobs'/><category term='Global Temporary Tables'/><category term='DBMS_SCHEDULER'/><category term='DB CPU'/><category term='collections'/><category term='Jonathan Lewis'/><category term='series'/><category term='SQL profiles'/><category term='diagnosis'/><category term='Deferred Segment Creation'/><title type='text'>Oracle related stuff</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default?start-index=101&amp;max-results=100'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>101</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-3328209172102920389</id><published>2012-01-26T23:04:00.005+01:00</published><updated>2012-01-27T00:05:24.727+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cursor Sharing'/><category scheme='http://www.blogger.com/atom/ns#' term='AUTOTRACE'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='DBMS_XPLAN'/><category scheme='http://www.blogger.com/atom/ns#' term='CBO'/><category scheme='http://www.blogger.com/atom/ns#' term='Plan stability'/><category scheme='http://www.blogger.com/atom/ns#' term='EXPLAIN PLAN'/><category scheme='http://www.blogger.com/atom/ns#' term='Execution Plan'/><category scheme='http://www.blogger.com/atom/ns#' term='10gR2'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL*Plus'/><title type='text'>Autotrace Polluting The Shared Pool?</title><content type='html'>&lt;span style="font-weight:bold;"&gt;Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Another random note that I made during the sessions attended at OOW was about the SQL*Plus AUTOTRACE feature. As you're hopefully already aware of this feature has some significant shortcomings, the most obvious being that it doesn't pull the actual execution plan from the Shared Pool after executing the statement but simply runs an EXPLAIN PLAN on the SQL text which might produce an execution plan that is different from the actual one for various reasons.&lt;br /&gt;&lt;br /&gt;Now the claim was made that in addition to these shortcomings the plan generated by the AUTOTRACE feature will stay in the Shared Pool and is eligible for sharing, which would mean that other statement executions could be affected by a potentially bad execution plan generated via AUTOTRACE rather then getting re-optimized on their own.&lt;br /&gt;&lt;br /&gt;Now that claim initially struck me as odd because so far I was under the impression that the shortcoming of AUTOTRACE was the fact that it simply used the EXPLAIN PLAN facility to get the execution plan details - and I don't think that any plan generated by EXPLAIN PLAN is eligible for sharing with actual statement execution. After thinking about it for a while I realized however that there are some interesting side effects possible, but it depends on how you actually use AUTOTRACE.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Using Default AUTOTRACE&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So in order to see what AUTOTRACE does behind the scenes I've decided to trace AUTOTRACE. Here is what I've tried:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;set echo on timing on&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'autotrace';&lt;br /&gt;&lt;br /&gt;alter session set sql_trace = true;&lt;br /&gt;&lt;br /&gt;set autotrace on&lt;br /&gt;&lt;br /&gt;var n number&lt;br /&gt;&lt;br /&gt;exec :n := 1&lt;br /&gt;&lt;br /&gt;select * from dual where 1 = :n;&lt;br /&gt;&lt;br /&gt;select * from dual where dummy = 'X';&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;And that's a snippet from the corresponding SQL trace file:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;=====================&lt;br /&gt;PARSING IN CURSOR #7 len=31 dep=0 uid=91 oct=3 lid=91 tim=651497870527 hv=868568466 ad='7ff0ce23638' sqlid='b9j0230twamck'&lt;br /&gt;select * from dual where 1 = :n&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #7:c=0,e=460,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=651497870525&lt;br /&gt;=====================&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;EXEC #7:c=0,e=1306,p=1,cr=3,cu=0,mis=0,r=0,dep=0,og=1,plh=3752461848,tim=651497871918&lt;br /&gt;FETCH #7:c=0,e=654,p=2,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=3752461848,tim=651497872660&lt;br /&gt;STAT #7 id=1 cnt=1 pid=0 pos=1 obj=0 op='FILTER  (cr=3 pr=2 pw=0 time=0 us)'&lt;br /&gt;STAT #7 id=2 cnt=1 pid=1 pos=1 obj=116 op='TABLE ACCESS FULL DUAL (cr=3 pr=2 pw=0 time=0 us cost=2 size=2 card=1)'&lt;br /&gt;FETCH #7:c=0,e=3,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=3752461848,tim=651497873015&lt;br /&gt;CLOSE #7:c=0,e=16,dep=0,type=0,tim=651497876511&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;=====================&lt;br /&gt;PARSING IN CURSOR #9 len=79 dep=0 uid=91 oct=3 lid=91 tim=651497880846 hv=3377064296 ad='7ff0ce196a8' sqlid='1tfgxbv4nmub8'&lt;br /&gt;EXPLAIN PLAN SET STATEMENT_ID='PLUS6499083' FOR select * from dual where 1 = :n&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #9:c=0,e=583,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=651497880843&lt;br /&gt;=====================&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;=====================&lt;br /&gt;PARSING IN CURSOR #2 len=74 dep=0 uid=91 oct=3 lid=91 tim=651497888595 hv=920998108 ad='7ff0cdd8b00' sqlid='3s1hh8cvfan6w'&lt;br /&gt;SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE', :1))&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #2:c=0,e=264,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=651497888593&lt;br /&gt;=====================&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;PARSING IN CURSOR #7 len=36 dep=0 uid=91 oct=3 lid=91 tim=651498044006 hv=3267611628 ad='7ff0cdbd0f8' sqlid='4k6g7vr1c7kzc'&lt;br /&gt;select * from dual where dummy = 'X'&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #7:c=0,e=1071,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=272002086,tim=651498044003&lt;br /&gt;EXEC #7:c=0,e=34,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=272002086,tim=651498044138&lt;br /&gt;FETCH #7:c=0,e=60,p=0,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=272002086,tim=651498044289&lt;br /&gt;STAT #7 id=1 cnt=1 pid=0 pos=1 obj=116 op='TABLE ACCESS FULL DUAL (cr=3 pr=0 pw=0 time=0 us cost=2 size=2 card=1)'&lt;br /&gt;FETCH #7:c=0,e=2,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=272002086,tim=651498044616&lt;br /&gt;CLOSE #7:c=0,e=28,dep=0,type=0,tim=651498062083&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;=====================&lt;br /&gt;PARSING IN CURSOR #2 len=84 dep=0 uid=91 oct=50 lid=91 tim=651498073656 hv=290419607 ad='7ff0cdb8a28' sqlid='5jx46tw8nywwr'&lt;br /&gt;EXPLAIN PLAN SET STATEMENT_ID='PLUS6499083' FOR select * from dual where dummy = 'X'&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #2:c=0,e=1295,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=272002086,tim=651498073653&lt;br /&gt;=====================&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;=====================&lt;br /&gt;PARSING IN CURSOR #9 len=74 dep=0 uid=91 oct=3 lid=91 tim=651498076015 hv=920998108 ad='7ff0cdd8b00' sqlid='3s1hh8cvfan6w'&lt;br /&gt;SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE', :1))&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #9:c=0,e=254,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=651498076013&lt;br /&gt;=====================&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So that looks pretty much like the expected behaviour I've mentioned above - AUTOTRACE executes the statements and afterwards runs an EXPLAIN PLAN to show the execution plan.&lt;br /&gt;&lt;br /&gt;As a side note it's interesting that the SQL trace doesn't contain the queries used to gather the delta of the session statistics. The reason is simple: They are not issued by this session. SQL*Plus establishes temporarily a second session for that purpose, using one of the modes provided by the OCI allowing to create a second session on the same connection / process. You can tell this by looking at the corresponding V$SESSION.PADDR resp. the entry in V$PROCESS: For both sessions the same process entry will be used (dedicated server model). By the way I've adopted the same approach for SQLTools++, the GUI that I maintain, for all activities that potentially could interfere with the main session, like collecting session statistics delta or calling DBMS_XPLAN.DISPLAY_CURSOR.&lt;br /&gt;&lt;br /&gt;So when using AUTOTRACE in this way the only potential threat comes from the actual execution of the statement - but this is no different from executing a statement in any other way. Of course you'll appreciate that using an odd bind value in the execution as part of the AUTOTRACE activity could theoretically lead to issues with the shared usage of such a cursor afterwards - again this is nothing that is special to AUTOTRACE.&lt;br /&gt;&lt;br /&gt;The potentially "wrong" execution plan that can be reported via the EXPLAIN PLAN cannot cause problems for other cursors, simply because it is generated via EXPLAIN PLAN. To make this point clear, here is another script that demonstrates:&lt;br /&gt;&lt;br /&gt;- How AUTOTRACE can lie&lt;br /&gt;- How EXPLAIN PLAN cursors are unshared by default&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;-- Demonstrate that AUTOTRACE can lie&lt;br /&gt;set echo on linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;&lt;br /&gt;drop table t;&lt;br /&gt;&lt;br /&gt;purge table t;&lt;br /&gt;&lt;br /&gt;create table t&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 1000000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't')&lt;br /&gt;&lt;br /&gt;create index t_idx on t (id);&lt;br /&gt;&lt;br /&gt;-- Compare the execution plan&lt;br /&gt;-- reported by AUTOTRACE&lt;br /&gt;-- to the one reported by DBMS_XPLAN.DISPLAY_CURSOR&lt;br /&gt;set autotrace on&lt;br /&gt;&lt;br /&gt;var n number&lt;br /&gt;&lt;br /&gt;exec :n := 500000&lt;br /&gt;&lt;br /&gt;select /* FIND_ME */ * from (&lt;br /&gt;select * from t where id &amp;gt; :n&lt;br /&gt;)&lt;br /&gt;where rownum &amp;gt; 1;&lt;br /&gt;&lt;br /&gt;set autotrace off&lt;br /&gt;&lt;br /&gt;select /* FIND_ME */ * from (&lt;br /&gt;select * from t where id &amp;gt; :n&lt;br /&gt;)&lt;br /&gt;where rownum &amp;gt; 1;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;&lt;br /&gt;-- Demonstrate that EXPLAIN PLAN cursors get special treatment&lt;br /&gt;-- They are unshared by default&lt;br /&gt;set echo off timing off feedback off long 1000000 longchunksize 1000000&lt;br /&gt;&lt;br /&gt;spool %TEMP%\explain_plan_example.sql&lt;br /&gt;&lt;br /&gt;select * from (&lt;br /&gt;  select&lt;br /&gt;          sql_fulltext&lt;br /&gt;  from&lt;br /&gt;          v$sqlstats&lt;br /&gt;  where&lt;br /&gt;          sql_text like 'EXPLAIN PLAN%/* FIND_ME */%rownum &amp;gt; 1%'&lt;br /&gt;  and     sql_text not like '%v$sql%'&lt;br /&gt;  order by&lt;br /&gt;          last_active_time desc&lt;br /&gt;)&lt;br /&gt;where&lt;br /&gt;        rownum &amp;lt;= 1&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;spool off&lt;br /&gt;&lt;br /&gt;-- Each execution of the same parent EXPLAIN PLAN cursor&lt;br /&gt;-- leads to a new child cursor&lt;br /&gt;set echo on feedback on timing on pagesize 999&lt;br /&gt;&lt;br /&gt;@%TEMP%\explain_plan_example&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;column sql_id new_value sql_id&lt;br /&gt;&lt;br /&gt;select * from (&lt;br /&gt;  select&lt;br /&gt;          sql_id&lt;br /&gt;  from&lt;br /&gt;          v$sqlstats&lt;br /&gt;  where&lt;br /&gt;          sql_text like 'EXPLAIN PLAN%/* FIND_ME */%rownum &amp;gt; 1%'&lt;br /&gt;  and     sql_text not like '%v$sql%'&lt;br /&gt;  order by&lt;br /&gt;          last_active_time desc&lt;br /&gt;)&lt;br /&gt;where&lt;br /&gt;        rownum &amp;lt;= 1&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;        sql_id&lt;br /&gt;      , child_number&lt;br /&gt;      , explain_plan_cursor&lt;br /&gt;from&lt;br /&gt;        v$sql_shared_cursor&lt;br /&gt;where&lt;br /&gt;        sql_id = '&amp;sql_id';&lt;br /&gt;&lt;br /&gt;set serveroutput on&lt;br /&gt;&lt;br /&gt;@sql_shared_cursor &amp;sql_id&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So if you run this script you'll see an example where AUTOTRACE gets it wrong because the plan generated via EXPLAIN PLAN is different from the actual plan used. Furthermore the plan generated via EXPLAIN PLAN can only match other EXPLAIN PLAN cursors, and on top these are then unshared by default - so no threat to any other SQL issued possible.&lt;br /&gt;&lt;br /&gt;Here's a sample output I got from 11.2.0.1:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&amp;gt; -- Demonstrate that AUTOTRACE can lie&lt;br /&gt;SQL&amp;gt; set echo on linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; drop table t;&lt;br /&gt;&lt;br /&gt;Table dropped.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.03&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; purge table t;&lt;br /&gt;&lt;br /&gt;Table purged.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.04&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; create table t&lt;br /&gt;  2  as&lt;br /&gt;  3  select&lt;br /&gt;  4          rownum as id&lt;br /&gt;  5        , rpad('x', 100) as filler&lt;br /&gt;  6  from&lt;br /&gt;  7          dual&lt;br /&gt;  8  connect by&lt;br /&gt;  9          level &amp;lt;= 1000000&lt;br /&gt; 10  ;&lt;br /&gt;&lt;br /&gt;Table created.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:02.38&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; exec dbms_stats.gather_table_stats(null, 't')&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:02.40&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; create index t_idx on t (id);&lt;br /&gt;&lt;br /&gt;Index created.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:01.63&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- Compare the execution plan&lt;br /&gt;SQL&amp;gt; -- reported by AUTOTRACE&lt;br /&gt;SQL&amp;gt; -- to the one reported by DBMS_XPLAN.DISPLAY_CURSOR&lt;br /&gt;SQL&amp;gt; set autotrace on&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; var n number&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; exec :n := 500000&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.01&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select /* FIND_ME */ * from (&lt;br /&gt;  2  select * from t where id &amp;gt; :n&lt;br /&gt;  3  )&lt;br /&gt;  4  where rownum &amp;gt; 1;&lt;br /&gt;&lt;br /&gt;no rows selected&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:01.51&lt;br /&gt;&lt;br /&gt;Execution Plan&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;Plan hash value: 2383791439&lt;br /&gt;&lt;br /&gt;---------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;---------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |       | 50000 |  5175K|   162   (0)| 00:00:02 |&lt;br /&gt;|   1 |  COUNT                        |       |       |       |            |          |&lt;br /&gt;|*  2 |   FILTER                      |       |       |       |            |          |&lt;br /&gt;|   3 |    TABLE ACCESS BY INDEX ROWID| T     | 50000 |  5175K|   162   (0)| 00:00:02 |&lt;br /&gt;|*  4 |     INDEX RANGE SCAN          | T_IDX |  9000 |       |    23   (0)| 00:00:01 |&lt;br /&gt;---------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&amp;gt;1)&lt;br /&gt;   4 - access("ID"&amp;gt;TO_NUMBER(:N))&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Statistics&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;          1  recursive calls&lt;br /&gt;          0  db block gets&lt;br /&gt;      15390  consistent gets&lt;br /&gt;      15385  physical reads&lt;br /&gt;          0  redo size&lt;br /&gt;        304  bytes sent via SQL*Net to client&lt;br /&gt;        349  bytes received via SQL*Net from client&lt;br /&gt;          1  SQL*Net roundtrips to/from client&lt;br /&gt;          0  sorts (memory)&lt;br /&gt;          0  sorts (disk)&lt;br /&gt;          0  rows processed&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; set autotrace off&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select /* FIND_ME */ * from (&lt;br /&gt;  2  select * from t where id &amp;gt; :n&lt;br /&gt;  3  )&lt;br /&gt;  4  where rownum &amp;gt; 1;&lt;br /&gt;&lt;br /&gt;no rows selected&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.98&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;SQL_ID  8q13ghbwgsmkv, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select /* FIND_ME */ * from ( select * from t where id &amp;gt; :n ) where&lt;br /&gt;rownum &amp;gt; 1&lt;br /&gt;&lt;br /&gt;Plan hash value: 4220795399&lt;br /&gt;&lt;br /&gt;----------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;----------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |       |       |  4204 (100)|          |&lt;br /&gt;|   1 |  COUNT              |      |       |       |            |          |&lt;br /&gt;|*  2 |   FILTER            |      |       |       |            |          |&lt;br /&gt;|*  3 |    TABLE ACCESS FULL| T    |   500K|    50M|  4204   (1)| 00:00:51 |&lt;br /&gt;----------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&amp;gt;1)&lt;br /&gt;   3 - filter("ID"&amp;gt;:N)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;22 rows selected.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.12&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- Demonstrate that EXPLAIN PLAN cursors get special treatment&lt;br /&gt;SQL&amp;gt; -- They are unshared by default&lt;br /&gt;SQL&amp;gt; set echo off timing off feedback off long 1000000 longchunksize 1000000&lt;br /&gt;EXPLAIN PLAN SET STATEMENT_ID='PLUS6552708' FOR select /* FIND_ME */ * from (&lt;br /&gt;select * from t where id &amp;gt; :n&lt;br /&gt;)&lt;br /&gt;where rownum &amp;gt; 1&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; @%TEMP%\explain_plan_example&lt;br /&gt;SQL&amp;gt; EXPLAIN PLAN SET STATEMENT_ID='PLUS6552708' FOR select /* FIND_ME */ * from (&lt;br /&gt;  2  select * from t where id &amp;gt; :n&lt;br /&gt;  3  )&lt;br /&gt;  4  where rownum &amp;gt; 1&lt;br /&gt;  5&lt;br /&gt;SQL&amp;gt; /&lt;br /&gt;&lt;br /&gt;Explained.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.00&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; /&lt;br /&gt;&lt;br /&gt;Explained.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.00&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; column sql_id new_value sql_id&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select * from (&lt;br /&gt;  2    select&lt;br /&gt;  3            sql_id&lt;br /&gt;  4    from&lt;br /&gt;  5            v$sqlstats&lt;br /&gt;  6    where&lt;br /&gt;  7            sql_text like 'EXPLAIN PLAN%/* FIND_ME */%rownum &amp;gt; 1%'&lt;br /&gt;  8    and     sql_text not like '%v$sql%'&lt;br /&gt;  9    order by&lt;br /&gt; 10            last_active_time desc&lt;br /&gt; 11  )&lt;br /&gt; 12  where&lt;br /&gt; 13          rownum &amp;lt;= 1&lt;br /&gt; 14  ;&lt;br /&gt;&lt;br /&gt;SQL_ID&lt;br /&gt;-------------&lt;br /&gt;ctms62wkwp7nz&lt;br /&gt;&lt;br /&gt;1 row selected.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.03&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select&lt;br /&gt;  2          sql_id&lt;br /&gt;  3        , child_number&lt;br /&gt;  4        , explain_plan_cursor&lt;br /&gt;  5  from&lt;br /&gt;  6          v$sql_shared_cursor&lt;br /&gt;  7  where&lt;br /&gt;  8          sql_id = '&amp;sql_id';&lt;br /&gt;&lt;br /&gt;SQL_ID        CHILD_NUMBER E&lt;br /&gt;------------- ------------ -&lt;br /&gt;ctms62wkwp7nz            0 N&lt;br /&gt;ctms62wkwp7nz            1 Y&lt;br /&gt;ctms62wkwp7nz            2 Y&lt;br /&gt;&lt;br /&gt;3 rows selected.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.03&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; set serveroutput on&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; @sql_shared_cursor &amp;sql_id&lt;br /&gt;SQL&amp;gt; declare&lt;br /&gt;  2    c         number;&lt;br /&gt;  3    col_cnt   number;&lt;br /&gt;  4    col_rec   dbms_sql.desc_tab;&lt;br /&gt;  5    col_value varchar2(4000);&lt;br /&gt;  6    ret_val    number;&lt;br /&gt;  7  begin&lt;br /&gt;  8    c := dbms_sql.open_cursor;&lt;br /&gt;  9    dbms_sql.parse(c,&lt;br /&gt; 10        'select q.sql_text, s.*&lt;br /&gt; 11        from v$sql_shared_cursor s, v$sql q&lt;br /&gt; 12        where s.sql_id = q.sql_id&lt;br /&gt; 13            and s.child_number = q.child_number&lt;br /&gt; 14            and q.sql_id = ''&amp;1''',&lt;br /&gt; 15        dbms_sql.native);&lt;br /&gt; 16    dbms_sql.describe_columns(c, col_cnt, col_rec);&lt;br /&gt; 17&lt;br /&gt; 18    for idx in 1 .. col_cnt loop&lt;br /&gt; 19      dbms_sql.define_column(c, idx, col_value, 4000);&lt;br /&gt; 20    end loop;&lt;br /&gt; 21&lt;br /&gt; 22    ret_val := dbms_sql.execute(c);&lt;br /&gt; 23&lt;br /&gt; 24    while(dbms_sql.fetch_rows(c) &amp;gt; 0) loop&lt;br /&gt; 25      for idx in 1 .. col_cnt loop&lt;br /&gt; 26        dbms_sql.column_value(c, idx, col_value);&lt;br /&gt; 27        if col_rec(idx).col_name in ('SQL_ID', 'ADDRESS', 'CHILD_ADDRESS',&lt;br /&gt; 28                      'CHILD_NUMBER', 'SQL_TEXT') then&lt;br /&gt; 29          dbms_output.put_line(rpad(col_rec(idx).col_name, 30) ||&lt;br /&gt; 30                  ' = ' || col_value);&lt;br /&gt; 31        elsif col_value = 'Y' then&lt;br /&gt; 32          dbms_output.put_line(rpad(col_rec(idx).col_name, 30) ||&lt;br /&gt; 33                  ' = ' || col_value);&lt;br /&gt; 34        end if;&lt;br /&gt; 35      end loop;&lt;br /&gt; 36      dbms_output.put_line('--------------------------------------------------');&lt;br /&gt; 37     end loop;&lt;br /&gt; 38&lt;br /&gt; 39    dbms_sql.close_cursor(c);&lt;br /&gt; 40  end;&lt;br /&gt; 41  /&lt;br /&gt;SQL_TEXT                       = EXPLAIN PLAN SET STATEMENT_ID='PLUS6552708' FOR select /* FIND_ME */ * from ( select * from t where id &amp;gt; :n ) where rownum &amp;gt; 1&lt;br /&gt;SQL_ID                         = ctms62wkwp7nz&lt;br /&gt;ADDRESS                        = 000007FF0DD90180&lt;br /&gt;CHILD_ADDRESS                  = 000007FF0DD87E70&lt;br /&gt;CHILD_NUMBER                   = 0&lt;br /&gt;--------------------------------------------------&lt;br /&gt;SQL_TEXT                       = EXPLAIN PLAN SET STATEMENT_ID='PLUS6552708' FOR select /* FIND_ME */ * from ( select * from t where id &amp;gt; :n ) where rownum &amp;gt; 1&lt;br /&gt;SQL_ID                         = ctms62wkwp7nz&lt;br /&gt;ADDRESS                        = 000007FF0DD90180&lt;br /&gt;CHILD_ADDRESS                  = 000007FF0DCD0D10&lt;br /&gt;CHILD_NUMBER                   = 1&lt;br /&gt;EXPLAIN_PLAN_CURSOR            = Y&lt;br /&gt;--------------------------------------------------&lt;br /&gt;SQL_TEXT                       = EXPLAIN PLAN SET STATEMENT_ID='PLUS6552708' FOR select /* FIND_ME */ * from ( select * from t where id &amp;gt; :n ) where rownum &amp;gt; 1&lt;br /&gt;SQL_ID                         = ctms62wkwp7nz&lt;br /&gt;ADDRESS                        = 000007FF0DD90180&lt;br /&gt;CHILD_ADDRESS                  = 000007FF0DCAAA20&lt;br /&gt;CHILD_NUMBER                   = 2&lt;br /&gt;EXPLAIN_PLAN_CURSOR            = Y&lt;br /&gt;--------------------------------------------------&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.10&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Other Autotrace Options&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The perhaps less expected aspect comes into the picture if you attempt to use AUTOTRACE differently - there are various options and when using a particular combination AUTOTRACE doesn't really execute the statement but reports only the execution plan, so if you change the first example above from:&lt;br /&gt;&lt;br /&gt;SET AUTOTRACE ON&lt;br /&gt;&lt;br /&gt;to&lt;br /&gt;&lt;br /&gt;SET AUTOTRACE TRACEONLY EXPLAIN&lt;br /&gt;&lt;br /&gt;then have a close look at the SQL trace generated:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;=====================&lt;br /&gt;PARSING IN CURSOR #2 len=45 dep=0 uid=91 oct=3 lid=91 tim=416642144779 hv=3626603586 ad='7ff13a1c8b0' sqlid='9pj321gc2m522'&lt;br /&gt;select /* FIND_ME */ * from dual where 1 = :n&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #2:c=0,e=64,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3752461848,tim=416642144777&lt;br /&gt;CLOSE #2:c=0,e=14,dep=0,type=0,tim=416642145372&lt;br /&gt;=====================&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;=====================&lt;br /&gt;PARSING IN CURSOR #3 len=93 dep=0 uid=91 oct=3 lid=91 tim=416642148753 hv=2987003528 ad='7ff13cd8ea0' sqlid='fu0myxft0n3n8'&lt;br /&gt;EXPLAIN PLAN SET STATEMENT_ID='PLUS6510526' FOR select /* FIND_ME */ * from dual where 1 = :n&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #3:c=0,e=689,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=416642148749&lt;br /&gt;=====================&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;=====================&lt;br /&gt;PARSING IN CURSOR #6 len=50 dep=0 uid=91 oct=3 lid=91 tim=416642233676 hv=37196885 ad='7ff138c8570' sqlid='f8cyn9w13g52p'&lt;br /&gt;select /* FIND_ME */ * from dual where dummy = 'X'&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #6:c=0,e=116,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=272002086,tim=416642233673&lt;br /&gt;CLOSE #6:c=0,e=32,dep=0,type=0,tim=416642237105&lt;br /&gt;=====================&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;=====================&lt;br /&gt;PARSING IN CURSOR #3 len=98 dep=0 uid=91 oct=50 lid=91 tim=416642243694 hv=390050481 ad='7ff1374bcf8' sqlid='8vvq0ncbmzcpj'&lt;br /&gt;EXPLAIN PLAN SET STATEMENT_ID='PLUS6510526' FOR select /* FIND_ME */ * from dual where dummy = 'X'&lt;br /&gt;END OF STMT&lt;br /&gt;PARSE #3:c=0,e=1261,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=272002086,tim=416642243691&lt;br /&gt;=====================&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Can you spot the difference? SQL*Plus now only parses the SQL before actually running the EXPLAIN PLAN command.&lt;br /&gt;&lt;br /&gt;Let's see what happens if the second example from above gets executed with the AUTOTRACE TRACEONLY EXPLAIN option:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;-- Demonstrate that AUTOTRACE TRACEONLY EXPLAIN &lt;br /&gt;-- can cause problems for other SQL executions&lt;br /&gt;set echo on linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;&lt;br /&gt;drop table t;&lt;br /&gt;&lt;br /&gt;purge table t;&lt;br /&gt;&lt;br /&gt;create table t&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 1000000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't')&lt;br /&gt;&lt;br /&gt;create index t_idx on t (id);&lt;br /&gt;&lt;br /&gt;set autotrace traceonly explain&lt;br /&gt;&lt;br /&gt;var n number&lt;br /&gt;&lt;br /&gt;exec :n := 500000&lt;br /&gt;&lt;br /&gt;select /* FIND_ME */ * from (&lt;br /&gt;select * from t where id &amp;gt; :n&lt;br /&gt;)&lt;br /&gt;where rownum &amp;gt; 1;&lt;br /&gt;&lt;br /&gt;set autotrace off&lt;br /&gt;&lt;br /&gt;select /* FIND_ME */ * from (&lt;br /&gt;select * from t where id &amp;gt; :n&lt;br /&gt;)&lt;br /&gt;where rownum &amp;gt; 1;&lt;br /&gt;&lt;br /&gt;-- Now the execution plan generated by the PARSE call issued by SQL*Plus&lt;br /&gt;-- will be re-used by the subsequent executions&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Here's again a sample output from 11.2.0.1:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&amp;gt; drop table t;&lt;br /&gt;&lt;br /&gt;Table dropped.&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; purge table t;&lt;br /&gt;&lt;br /&gt;Table purged.&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; create table t&lt;br /&gt;  2  as&lt;br /&gt;  3  select&lt;br /&gt;  4          rownum as id&lt;br /&gt;  5        , rpad('x', 100) as filler&lt;br /&gt;  6  from&lt;br /&gt;  7          dual&lt;br /&gt;  8  connect by&lt;br /&gt;  9          level &amp;lt;= 1000000&lt;br /&gt; 10  ;&lt;br /&gt;&lt;br /&gt;Table created.&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; exec dbms_stats.gather_table_stats(null, 't')&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; create index t_idx on t (id);&lt;br /&gt;&lt;br /&gt;Index created.&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; set autotrace traceonly explain&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; var n number&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; exec :n := 500000&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select /* FIND_ME */ * from (&lt;br /&gt;  2  select * from t where id &amp;gt; :n&lt;br /&gt;  3  )&lt;br /&gt;  4  where rownum &amp;gt; 1;&lt;br /&gt;&lt;br /&gt;Execution Plan&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;Plan hash value: 2383791439&lt;br /&gt;&lt;br /&gt;---------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;---------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |       | 50000 |  5175K|   162   (0)| 00:00:02 |&lt;br /&gt;|   1 |  COUNT                        |       |       |       |            |          |&lt;br /&gt;|*  2 |   FILTER                      |       |       |       |            |          |&lt;br /&gt;|   3 |    TABLE ACCESS BY INDEX ROWID| T     | 50000 |  5175K|   162   (0)| 00:00:02 |&lt;br /&gt;|*  4 |     INDEX RANGE SCAN          | T_IDX |  9000 |       |    23   (0)| 00:00:01 |&lt;br /&gt;---------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&amp;gt;1)&lt;br /&gt;   4 - access("ID"&amp;gt;TO_NUMBER(:N))&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; set autotrace off&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select /* FIND_ME */ * from (&lt;br /&gt;  2  select * from t where id &amp;gt; :n&lt;br /&gt;  3  )&lt;br /&gt;  4  where rownum &amp;gt; 1;&lt;br /&gt;&lt;br /&gt;no rows selected&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- Now the execution plan generated by the PARSE call issued by SQL*Plus&lt;br /&gt;SQL&amp;gt; -- will be re-used by the subsequent executions&lt;br /&gt;SQL&amp;gt; select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;SQL_ID  8q13ghbwgsmkv, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select /* FIND_ME */ * from ( select * from t where id &amp;gt; :n ) where&lt;br /&gt;rownum &amp;gt; 1&lt;br /&gt;&lt;br /&gt;Plan hash value: 2383791439&lt;br /&gt;&lt;br /&gt;---------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;---------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |       |       |       |   162 (100)|          |&lt;br /&gt;|   1 |  COUNT                        |       |       |       |            |          |&lt;br /&gt;|*  2 |   FILTER                      |       |       |       |            |          |&lt;br /&gt;|   3 |    TABLE ACCESS BY INDEX ROWID| T     | 50000 |  5175K|   162   (0)| 00:00:02 |&lt;br /&gt;|*  4 |     INDEX RANGE SCAN          | T_IDX |  9000 |       |    23   (0)| 00:00:01 |&lt;br /&gt;---------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&amp;gt;1)&lt;br /&gt;   4 - access("ID"&amp;gt;:N)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;23 rows selected.&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So that's interesting: By using the TRACEONLY EXPLAIN option I now ended up with a potentially "wrong" execution plan that is actually eligible for sharing with other executions.&lt;br /&gt;&lt;br /&gt;What surprised me most was the fact that I expected a bind variable type mismatch (CHAR vs. NUMBER, check the "Predicate Information" section) between the parse and the execution and therefore a re-optimization that actually peeked at the bind variables rather than re-using and sharing the existing cursor, but obviously the cursor was eligible for sharing. Very likely this is due to the fact that the parse call didn't actually bind any variables hence the mentioned mismatch wasn't possible.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So in summary I think what can be said is this:&lt;br /&gt;&lt;br /&gt;- Don't use AUTOTRACE if you want to get the actual execution plan&lt;br /&gt;&lt;br /&gt;- The potentially "wrong" execution plans reported by AUTOTRACE usually do not represent a threat because these are EXPLAIN PLAN cursor&lt;br /&gt;&lt;br /&gt;- The potential threat of AUTOTRACE variants that actually execute the statement is the fact that the plan used by this actual execution is definitely eligible for sharing with other executions, but this no different from any other execution, so nothing special about AUTOTRACE here either&lt;br /&gt;&lt;br /&gt;- There is a potential threat when using the AUTOTRACE TRACEONLY EXPLAIN option - the PARSE only but not execute behaviour could leave undesirable cursors behind that are eligible for sharing. This applies in particular to SQL statements using bind variables&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-3328209172102920389?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/3328209172102920389/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=3328209172102920389' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/3328209172102920389'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/3328209172102920389'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2012/01/autotrace-polluting-shared-pool.html' title='Autotrace Polluting The Shared Pool?'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-7839445375785759727</id><published>2012-01-15T23:07:00.004+01:00</published><updated>2012-01-16T09:46:55.609+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='DBMS_STATS'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='Partitioning'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='CBO'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='incremental'/><title type='text'>Incremental Partition Statistics Review</title><content type='html'>&lt;span style="font-weight:bold;"&gt;Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here is a summary of the findings while evaluating Incremental Partition Statistics that have been introduced in Oracle 11g.&lt;br /&gt;&lt;br /&gt;The most important point to understand is that Incremental Partition Statistics are not "cost-free", so anyone who is telling you that you can gather statistics on the lowest level (partition or sub-partition in case of composite partitioning) without any noticeable overhead in comparison to non-incremental statistics (on the lowest level) is not telling you the truth.&lt;br /&gt;&lt;br /&gt;Although this might be obvious I've already personally heard someone making such claims so it's probably worth to mention.&lt;br /&gt;&lt;br /&gt;In principle you need to test on your individual system whether the overhead that is added to each statistics update on the lowest level outweighs the overhead of actually gathering statistics on higher levels, of course in particular on global level.&lt;br /&gt;&lt;br /&gt;This might also depend on your strategy how and how often you used to gather statistics so far.&lt;br /&gt;&lt;br /&gt;The overhead introduced by Incremental Partition Statistics can be significant, in terms of both runtime and data volume. You can expect the SYSAUX tablespace to grow by several GBs (for larger databases in the TB range easily in the tenth of GBs) depending on the number of partitions, number of columns and distinct values per column.&lt;br /&gt;&lt;br /&gt;To give you an idea here are some example figures from the evaluation:&lt;br /&gt;&lt;br /&gt;Table 1: 4 million total rows, 1 GB total size, 6 range partitions, 155 columns&lt;br /&gt;Table 2: 200 million total rows, 53 GB total size, 629 range-list subpartitions, 104 columns&lt;br /&gt;&lt;br /&gt;For Table 1 Incremental stats maintained 700,000 rows in SYS.WRI$_OPTSTAT_SYNOPSIS$. For Table 2 it was 3,400,000 rows. In total for these two tables approx. 4.1 million rows and 170 MB had to be maintained in the SYS.WRI$_OPTSTAT_SYNOPSIS$ tables.&lt;br /&gt;&lt;br /&gt;When I first saw this significant data volume generated for the synopsis meta data I was pretty sure that processing that amount of data will clearly cause some significant overhead, too.&lt;br /&gt;&lt;br /&gt;And that is exactly what happens - for example a recursive DELETE statement on the SYS.WRI$_OPTSTAT_SYNOPSIS$ table took about 10 secs out of 16 secs total runtime of statistics gathering for a rather small partition of above partitioned table. Here are some more figures from the test runs:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Timing comparison on an Exadata X2-8&lt;/span&gt; &lt;br /&gt;(tests were performed as only user on the system)&lt;br /&gt;&lt;br /&gt;Exadata X2-8 was 11.2.0.2 BP6, for comparison purposes a full rack V2 running 11.2.0.1.2 BP6(?) was used&lt;br /&gt;&lt;br /&gt;The following relevant parameters were used in the call to DBMS_STATS:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;dbms_stats.gather_table_stats(  &lt;br /&gt;  ownname =&amp;gt; ...&lt;br /&gt;, tabname =&amp;gt; ..., &lt;br /&gt;, partname=&amp;gt;'&amp;lt;PARTNAME&amp;gt;'&lt;br /&gt;, granularity=&amp;gt;'AUTO'&lt;br /&gt;, estimate_percent =&amp;gt; DBMS_STATS.AUTO_SAMPLE_SIZE&lt;br /&gt;, method_opt =&amp;gt; 'FOR ALL COLUMNS SIZE 1'&lt;br /&gt;, cascade =&amp;gt; true&lt;br /&gt;);&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;were &amp;lt;PARTNAME&amp;gt; is the name of the partition that was modified. Basically it was a simulation of a per-partition data load where the data is loaded into a separate segment and afterwards an exchange (sub)partition is performed with the main table.&lt;br /&gt;&lt;br /&gt;After exchange partition the statistics were refreshed on the main table using above call.&lt;br /&gt;&lt;br /&gt;Modification of a single partition of above Table 1, approx. 500,000 rows resp. 110 MB of data in this single partition.&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;INCREMENTAL =&amp;gt; FALSE: 7-13 seconds&lt;br /&gt;INCREMENTAL =&amp;gt; TRUE : 16 seconds (the majority of time is spent on a DELETE from &lt;br /&gt;SYS.WRI$_OPTSTAT_SYNOPSIS$)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Modification of a single subpartition of above Table 2, approx. 300,000 rows resp. 75 MB of data in this single subpartition.&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;INCREMENTAL =&amp;gt; FALSE: 67 seconds&lt;br /&gt;INCREMENTAL =&amp;gt; TRUE : 11.2.0.2 390 (!) seconds&lt;br /&gt;                      11.2.0.1 30 seconds&lt;br /&gt;                      11.2.0.2 with fix_control=8917507:OFF: 70 seconds&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So the overhead ratio depends largely on the time it actually takes to gather the statistics - for rather small partitions the meta data maintenance overhead will be enormous.&lt;br /&gt;&lt;br /&gt;On an Exadata X2-8 11.2.0.2 using the non-incremental approach of gathering lowest level partition statistics plus partition plus global statistics for the 53GB table (629 range-list subpartitions, 104 columns), took almost the same time as it took the incremental statistics to gather statistics only on lowest level plus the meta data maintenance / aggregation overhead.&lt;br /&gt;&lt;br /&gt;Of course you'll appreciate that the activity performed for those two operations is vastly different - the conventional statistics approach needs to throw all processing power of the X2-8 at this problem and any concurrent activity will have to share the CPU and I/O demand of that operation, while the mostly meta data based incremental statistics only allocate a single CPU and some I/O during the processing, leaving most of the I/O and CPU resources available for other concurrent tasks.&lt;br /&gt;&lt;br /&gt;On a larger data volume and/or slower systems the Incremental Partition Statistics will probably easily outperform the non-incremental approach.&lt;br /&gt;&lt;br /&gt;Furthermore it should be mentioned that the tests used the "FOR ALL COLUMNS SIZE 1" METHOD_OPT option that doesn't generate any histograms. The INCREMENTAL partition statistics feature is however capable of deriving upper level histograms from lower levels of statistics with histograms in place. This can mean a significant saving in processing time if histograms need to be maintained on upper levels since each histogram adds another pass to the DBMS_STATS processing. In fact the histograms generating by INCREMENTAL partition statistics might be even of better quality than those generated via explicit gathering because by default a quite low sample size is used for histogram generation in order to keep the overhead as small as possible.&lt;br /&gt;&lt;br /&gt;Note that according to the description the APPROX_GLOBAL AND PARTITION granularity also supports aggregation of histograms, but I haven't looked in detail into this option yet.&lt;br /&gt;&lt;br /&gt;As usual you'll have to test it yourself on your system and hardware, but the main point is that it doesn't come for free - it requires both significant space and runtime.&lt;br /&gt;&lt;br /&gt;One idea that might make sense is limiting the column statistics to those columns that you are sure you'll use in predicates / group bys / order bys. Any columns that are only used for display purposes could be left without any column statistics. Depending on your data model this might allow to save some volume and processing time, but it needs to be maintained on a per table basis rather than a one size fits all approach.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Further Findings&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here are some further findings that I found relevant:&lt;br /&gt;&lt;br /&gt;- INCREMENTAL =&amp;gt; TRUE means that ESTIMATE_PERCENT will be ignored - the new approximate NDV algorithm that reads all data but doesn't add the grouping overhead of a conventional aggregation method is mandatory for the new feature. This means in case of very large data sets to analyze that former approaches using very low sample sizes will now take significantly longer (approx. the time it takes to sample 10% of the data with the former approach), however with the benefit of producing almost 100% accurate statistics. There is currently no way around this - if you want to use INCREMENTAL you have to process 100% of the data using the new NDV algorithm. Note that this applies to 11.2 - I haven't tested this on 11.1&lt;br /&gt;&lt;br /&gt;- INCREMENTAL doesn't maintain any synopses for indexes, so in order to obtain higher level index statistics for partitioned indexes it always includes a gather global index statistics. However it resorts to a sample size and doesn't analyze the whole index. For very large indexes and/or a very large number of indexes the overhead can still be significant, so this is something to keep in mind: Even with incremental partition statistics there is a component that is dependent on the global volume, in this case the index volume&lt;br /&gt;&lt;br /&gt;- In order to effectively use INCREMENTAL the meta data for the synopses needs to be created initially for all partitions, even for those where the data doesn't change any longer. So for very large (historic) data volumes this initial synopsis generation can represent a challenge that needs to be planned and considered how it will be approached. You need to be careful how incremental will be enabled: If you simply switch it on and use GRANULARITY=&amp;gt;AUTO as outlined in the manuals the next gather statistics call on the table will gather the meta data for all (sub-)partitions of the table - this might take very, very long. It might be more sensible to gather statistics with a different GRANULARITY. This still adds the meta data maintenance overhead but you are in control of which partitions are going to be analyzed, allowing for a step-wise approach.&lt;br /&gt;&lt;br /&gt;- In 11.2.0.2 the underlying internal table structure has been changed significantly. In particular the table SYS.WRI$_OPTSTAT_SYNOPSIS$ has been changed from unpartitioned to composite partitioned. Interestingly it doesn't have a single index in 11.2.0.2 - it looks like having it composite-partitioned seemed to be sufficient to the developers. The change very likely has been introduced due to bug 9038395 that addresses the problem that deleting the statistics for a single table used to be dependent on the total amount of tables using incremental statistics. So that problem should be addressed now, but it still doesn't mean that the meta data maintenance overhead is now negligible&lt;br /&gt;&lt;br /&gt;- There is a bug in 11.2.0.2 that basically rendered the incremental partition statistics unusable with composite partitioned tables used at that client. A particular recursive SQL statement got executed multiple thousand times. This means it took up to several minutes to complete the meta data operation (see above timings). This is tracked with bug 12833442. The behaviour can be changed by using fix control 8917507 - which helped in this case to arrive at reasonable runtimes although 11.2.0.1 was still twice as fast.&lt;br /&gt;&lt;br /&gt;- INCREMENTAL =&amp;gt; TRUE doesn't work with locked statistics, you'll always end up with an ORA-20005 Object Statistics are locked even when specifying the FORCE =&amp;gt; TRUE option. This is tracked with bug 12369250 (according to MyOracleSupport fixed in the 11.2.0.3 patch set)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Footnote&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;All of the above applies to 11.2.0.2 resp. 11.2.0.1. I haven't had the chance yet to repeat those tests on 11.2.0.3.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-7839445375785759727?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/7839445375785759727/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=7839445375785759727' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/7839445375785759727'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/7839445375785759727'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2012/01/incremental-partition-statistics-review.html' title='Incremental Partition Statistics Review'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-7067380006645715334</id><published>2012-01-09T23:43:00.004+01:00</published><updated>2012-01-10T14:07:07.914+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.5'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='Partitioning'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.4'/><category scheme='http://www.blogger.com/atom/ns#' term='CBO'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='bug'/><category scheme='http://www.blogger.com/atom/ns#' term='10gR2'/><category scheme='http://www.blogger.com/atom/ns#' term='dynamic sampling'/><category scheme='http://www.blogger.com/atom/ns#' term='cardinality'/><title type='text'>Dynamic Sampling On Multiple Partitions - Bugs</title><content type='html'>In a &lt;a href="https://forums.oracle.com/forums/message.jspa?threadID=2328080"&gt;recent OTN thread&lt;/a&gt; I've been reminded of two facts about Dynamic Sampling that I already knew but had forgotten in the meantime:&lt;br /&gt;&lt;br /&gt;1. The table level dynamic sampling hint uses a different number of blocks for sampling than the session / cursor level dynamic sampling. So even if for both for example level 5 gets used the number of sampled blocks will be different for most of the 10 levels available (obviously level 0 and 10 are exceptions)&lt;br /&gt;&lt;br /&gt;2. The Dynamic Sampling code uses a different approach for partitioned objects if it is faced with the situation that there are more partitions than blocks to sample according to the level (and type table/cursor/session) of Dynamic Sampling&lt;br /&gt;&lt;br /&gt;Note that all this here applies to the case where no statistics have been gathered for the table - I don't cover the case when Dynamic Sampling gets used on top of existing statistics.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Dynamic Sampling Number Of Sample Blocks&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Jonathan Lewis &lt;a href="http://jonathanlewis.wordpress.com/2010/02/23/dynamic-sampling/"&gt;has a short post&lt;/a&gt; describing 1. above, although I believe that his post has a minor inaccuracy: The number of blocks sampled for the table level dynamic sampling is 32 * 2^(level - 1) not 32 * 2^level.&lt;br /&gt;&lt;br /&gt;Note that the constant 32 is defined by the internal parameter "_optimizer_dyn_smp_blks" and is independent from the block size. So this is one of the cases where a larger block size potentially gives better results because more data might be sampled, of course it also means performing more work for the sampling.&lt;br /&gt;&lt;br /&gt;Here are two excerpts from optimizer trace files that show both the difference between the table and cursor/session level sample sizes as well as the 2^(level -1) formula for the table level:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Table level 5:&lt;/span&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;** Executed dynamic sampling query:&lt;br /&gt;    level : 5&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;    max. sample block cnt. : 512&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Cursor/session level 5:&lt;/span&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;** Executed dynamic sampling query:&lt;br /&gt;    level : 5&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;    max. sample block cnt. : 64&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So both cases use level 5, but the number of sample blocks is different, and for the table level 5 it is 32 * 2^4 = 32 * 16 = 512 blocks&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Dynamic Sampling On Multiple Partitions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Point 2. above is also described &lt;a href="http://jonathanlewis.wordpress.com/2010/02/23/dynamic-sampling/#comment-40244"&gt;in one of the comments&lt;/a&gt; to the post mentioned. In principle the Dynamic Sampling code seems to assume an overhead of one sample block per (sub)segment, so the effective number of blocks to sample will fall short by the number of (sub)segments to sample.&lt;br /&gt;&lt;br /&gt;Probably this is based on the assumption that the segment header block needs to be accessed anyway when reading a segment.&lt;br /&gt;&lt;br /&gt;If the code didn't cater for this fact it could potentially end up with an effective number of blocks sampled that is far greater than defined by the sample size when dealing with partitioned objects.&lt;br /&gt;&lt;br /&gt;For non-partitioned objects this is not a big deal because it means exactly one block less than defined by the sample size.&lt;br /&gt;&lt;br /&gt;But if Dynamic Sampling needs to sample multiple partitions this has several consequences:&lt;br /&gt;&lt;br /&gt;a. The number of blocks that are effectively sampled for data can be far less than expected according to the number of blocks to be sampled, because the code reduces the number of blocks by the number of partitions to sample&lt;br /&gt;&lt;br /&gt;b. The point above poses a special challenge if there are actually more partitions to sample than blocks&lt;br /&gt;&lt;br /&gt;Note that Dynamic Sampling uses static / compile time partition pruning information to determine the number of partitions that need to be sampled. &lt;br /&gt;&lt;br /&gt;The upshot of this is that when sampling multiple partitions the sample sizes of the lower cursor/session Dynamic Sampling levels can be far too small for reasonable sample results.&lt;br /&gt;&lt;br /&gt;If the Dynamic Sampling code faces the situation where more partitions need to be sampled than blocks, it uses a different approach.&lt;br /&gt;&lt;br /&gt;Rather than sampling the whole table and therefore potentially accessing more partitions than blocks defined by the sample size it will randomly select (sample blocks / 2) subsegments.&lt;br /&gt;&lt;br /&gt;According to the number of blocks determined per subsegment it will then use a sample size such that in total (sample blocks / 2) blocks will be sampled for data.&lt;br /&gt;&lt;br /&gt;Of course you'll appreciate that this means that on average exactly one data block will be sampled for data per subsegment.&lt;br /&gt;&lt;br /&gt;The sample query looks different in such a case because the subsegments sampled are explicitly mentioned and combined via UNION ALL resulting in quite a lengthy statement - even with a small sample size like 32 blocks 16 queries on subsegments will be UNIONed together.&lt;br /&gt;&lt;br /&gt;Here are again two excerpts from optimizer trace files that show the two different approaches in action:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;More sample blocks than partitions:&lt;/span&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;** Performing dynamic sampling initial checks. **&lt;br /&gt;** Dynamic sampling initial checks returning TRUE (level = 5).&lt;br /&gt;** Dynamic sampling updated table stats.: blocks=17993&lt;br /&gt;** Generated dynamic sampling query:&lt;br /&gt;    query text : &lt;br /&gt;SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ 1 AS C1, 1 AS C2 FROM "T" SAMPLE BLOCK (0.711388 , 1) SEED (1) "T") SAMPLESUB&lt;br /&gt;&lt;br /&gt;*** 2012-01-03 09:45:22.695&lt;br /&gt;** Executed dynamic sampling query:&lt;br /&gt;    level : 5&lt;br /&gt;    sample pct. : 0.711388&lt;br /&gt;    total partitions : 384&lt;br /&gt;      partitions for sampling : 384&lt;br /&gt;    actual sample size : 7452&lt;br /&gt;    filtered sample card. : 7452&lt;br /&gt;    orig. card. : 98028&lt;br /&gt;    block cnt. table stat. : 17993&lt;br /&gt;    block cnt. for sampling: 17993&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Potentially all partitions get sampled and the query used is similar to the one used for non-partitioned objects.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Less or equal blocks than partitions:&lt;/span&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;** Performing dynamic sampling initial checks. **&lt;br /&gt;** Dynamic sampling initial checks returning TRUE (level = 5).&lt;br /&gt;** Dynamic sampling updated table stats.: blocks=1496&lt;br /&gt;&lt;br /&gt;*** 2012-01-03 09:44:04.492&lt;br /&gt;** Generated dynamic sampling query:&lt;br /&gt;    query text : &lt;br /&gt;SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT 1 AS C1, 1 AS C2 FROM ((SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(6) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(21) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(28) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(30) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(68) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(80) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(83) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(98) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(102) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(109) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(134) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(141) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(153) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(158) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(176) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(177) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(179) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(205) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(206) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(249) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(257) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(260) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(263) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(265) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(273) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(277) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(309) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(339) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(341) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(342) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(359) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(368) SAMPLE BLOCK (2.139037 , 1) SEED (1))) "T") SAMPLESUB&lt;br /&gt;&lt;br /&gt;** Executed dynamic sampling query:&lt;br /&gt;    level : 5&lt;br /&gt;    sample pct. : 2.139037&lt;br /&gt;    total partitions : 384&lt;br /&gt;      partitions for sampling : 384&lt;br /&gt;      partitions actually sampled from : 32&lt;br /&gt;    actual sample size : 2583&lt;br /&gt;    filtered sample card. : 2583&lt;br /&gt;    orig. card. : 98028&lt;br /&gt;    block cnt. table stat. : 1496&lt;br /&gt;    block cnt. for sampling: 17952&lt;br /&gt;    partition subset block cnt. : 1496&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;You can clearly see that the query looks quite different by listing a number of subpartitions explicitly. Also the text dumped to the trace file is different and says that it will restrict the sampling to 32 partitions.&lt;br /&gt;&lt;br /&gt;And it is this special case where in versions below 11.2.0.3 a silly bug in the code leads to incorrect cost estimates: When putting together the number of blocks that should be used for sampling and those that are extrapolated for the whole table the code copies the wrong number into the table stats - it uses the number of blocks to sample instead of the assumed table size. This can lead to a dramatic cost underestimate for a corresponding full table scan operation.&lt;br /&gt;&lt;br /&gt;The issue seems to be fixed in 11.2.0.3, but you can see in above excerpt from 11.2.0.1 the problem by checking carefully these lines:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;...&lt;br /&gt;** Dynamic sampling updated table stats.: blocks=1496 &amp;lt;=== wrong number copied from below&lt;br /&gt;...&lt;br /&gt;    block cnt. table stat. : 1496 &amp;lt;=== this should be on the next line&lt;br /&gt;    block cnt. for sampling: 17952 &amp;lt;=== this should be on the previous line&lt;br /&gt;    partition subset block cnt. : 1496&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;The two figures "block cnt. for sampling" and "block cnt. table stat." are swapped - and the wrong number is copied to the table stats line.&lt;br /&gt;&lt;br /&gt;This will result in a potential underestimate of the table blocks. The first plan is generated with the session level 5 sample size where the bug copies the wrong number of blocks:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;---------------------------------------+-----------------------------------+---------------+&lt;br /&gt;| Id  | Operation            | Name    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |&lt;br /&gt;---------------------------------------+-----------------------------------+---------------+&lt;br /&gt;| 0   | SELECT STATEMENT     |         |       |       |   &lt;span style="font-weight:bold;"&gt;246&lt;/span&gt; |           |       |       |&lt;br /&gt;| 1   |  PARTITION RANGE ALL |         |  996K |   89M |   246 |  00:00:03 | 1     | 12    |&lt;br /&gt;| 2   |   PARTITION HASH ALL |         |  996K |   89M |   246 |  00:00:03 | 1     | 32    |&lt;br /&gt;| 3   |    TABLE ACCESS FULL | T       |  996K |   89M |   246 |  00:00:03 | 1     | 384   |&lt;br /&gt;---------------------------------------+-----------------------------------+---------------+&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;The second plan is generated for the same data set but using the table level 5 sample size that results in using the different code path that is not affected by the bug:&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;---------------------------------------+-----------------------------------+---------------+&lt;br /&gt;| Id  | Operation            | Name    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |&lt;br /&gt;---------------------------------------+-----------------------------------+---------------+&lt;br /&gt;| 0   | SELECT STATEMENT     |         |       |       |  &lt;span style="font-weight:bold;"&gt;3637&lt;/span&gt; |           |       |       |&lt;br /&gt;| 1   |  PARTITION RANGE ALL |         |  970K |   86M |  3637 |  00:00:44 | 1     | 12    |&lt;br /&gt;| 2   |   PARTITION HASH ALL |         |  970K |   86M |  3637 |  00:00:44 | 1     | 32    |&lt;br /&gt;| 3   |    TABLE ACCESS FULL | T       |  970K |   86M |  3637 |  00:00:44 | 1     | 384   |&lt;br /&gt;---------------------------------------+-----------------------------------+---------------+&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Note that although a minor discrepancy might be explained by the different sample sizes a cost estimate difference by an order of magnitude is clearly questionable.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Nasty Bug When Using Indexes&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Finally there is another nasty bug waiting for you in the case of partitioned objects - and this time it doesn't matter if the number of partitions is more or less than the number of blocks to be sampled:&lt;br /&gt;&lt;br /&gt;Dynamic Sampling will also make use of eligible indexes if a filter predicate is applied to a table and a suitable index exists (which probably means that it starts with the predicates applied but I haven't investigated that to a full extent).&lt;br /&gt;&lt;br /&gt;The idea behind this is probably that by using the index a very cheap operation can be used to obtain a very precise selectivity estimate for highly selective predicates. Dynamic Sampling has some built-in sanity checks that reject the Dynamic Sampling result if not a reasonable number of rows pass the filter predicates applied - similar to saying "not enough data found to provide a reasonable estimate". So in case the filter predicates identify only a few rows out of many it requires a pretty high sample level in order to have the Dynamic Sampling results not rejected by these sanity checks.&lt;br /&gt;&lt;br /&gt;Things look different however if there is a suitable index available: Dynamic Sampling will run an additional index-only query that is limited to a small number of rows (2,500 rows seems to be a common number) and a where clause corresponding to the filter predicates. If the number of rows returned by this query is less than 2,500 Dynamic Sampling knows that this corresponds exactly to the cardinality / selectivity of the filter predicates.&lt;br /&gt;&lt;br /&gt;In case of partitioned objects though there is again a silly bug where the case of 100% matching rows is not handled correctly - so for any filter predicate that matches more than 2,500 rows the cardinality / selectivity estimate will be potentially incorrect.&lt;br /&gt;&lt;br /&gt;Here are again two optimizer trace excerpts that show the bug in action:&lt;br /&gt;&lt;br /&gt;Without a suitable index the cardinality estimate for a not really selective predicate (90%) is in the right ballpark:&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;** Dynamic sampling initial checks returning TRUE (level = 5).&lt;br /&gt;** Dynamic sampling updated table stats.: blocks=1585&lt;br /&gt;*** 2012-01-09 09:53:13.651&lt;br /&gt;** Generated dynamic sampling query:&lt;br /&gt;    query text : &lt;br /&gt;SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE */ 1 AS C1, CASE WHEN "T"."ID"&amp;gt;100000 THEN 1 ELSE 0 END AS C2 FROM ((SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(5) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(20) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(27) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(29) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(67) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(79) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(82) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(97) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(101) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(108) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(133) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(140) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(152) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(157) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(175) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(176) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(178) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(204) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(205) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(248) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(256) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(259) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(262) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(264) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(272) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(276) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(308) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(338) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(340) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(341) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(358) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(367) SAMPLE BLOCK (2.018927 , 1) SEED (1))) "T") SAMPLESUB&lt;br /&gt;*** 2012-01-09 09:53:13.869&lt;br /&gt;** Executed dynamic sampling query:&lt;br /&gt;    level : 5&lt;br /&gt;    sample pct. : 2.018927&lt;br /&gt;    total partitions : 384&lt;br /&gt;      partitions for sampling : 384&lt;br /&gt;      partitions actually sampled from : 32&lt;br /&gt;    actual sample size : 2063&lt;br /&gt;    filtered sample card. : 2003&lt;br /&gt;    orig. card. : 98028&lt;br /&gt;    block cnt. table stat. : 1585&lt;br /&gt;    block cnt. for sampling: 19020&lt;br /&gt;    partition subset block cnt. : 1585&lt;br /&gt;    max. sample block cnt. : 64&lt;br /&gt;    sample block cnt. : 32&lt;br /&gt;    min. sel. est. : 0.05000000&lt;br /&gt;** Using dynamic sampling card. : 1226196&lt;br /&gt;** Dynamic sampling updated table card.&lt;br /&gt;** Using single table dynamic sel. est. : 0.97091614&lt;br /&gt;  Table:  T  Alias: T     &lt;br /&gt;    Card: Original: 1226196  Rounded: 1190533  Computed: 1190533.13  Non Adjusted: 1190533.13&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;---------------------------------------+-----------------------------------+---------------+&lt;br /&gt;| Id  | Operation            | Name    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |&lt;br /&gt;---------------------------------------+-----------------------------------+---------------+&lt;br /&gt;| 0   | SELECT STATEMENT     |         |       |       |   360 |           |       |       |&lt;br /&gt;| 1   |  PARTITION RANGE ALL |         | 1163K |  103M |   360 |  00:00:05 | 1     | 12    |&lt;br /&gt;| 2   |   PARTITION HASH ALL |         | 1163K |  103M |   360 |  00:00:05 | 1     | 32    |&lt;br /&gt;| 3   |    TABLE ACCESS FULL | T       | 1163K |  103M |   360 |  00:00:05 | 1     | 384   |&lt;br /&gt;---------------------------------------+-----------------------------------+---------------+&lt;br /&gt;Predicate Information:&lt;br /&gt;----------------------&lt;br /&gt;3 - filter("ID"&amp;gt;100000)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;With a suitable index in place the cardinality is estimated at 2,500 for the same data set:&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;** Dynamic sampling initial checks returning TRUE (level = 5).&lt;br /&gt;** Dynamic sampling updated index stats.: T_IDX, blocks=3840&lt;br /&gt;** Dynamic sampling index access candidate : T_IDX&lt;br /&gt;** Dynamic sampling updated table stats.: blocks=1585&lt;br /&gt;*** 2012-01-09 10:01:32.960&lt;br /&gt;** Generated dynamic sampling query:&lt;br /&gt;    query text : &lt;br /&gt;SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE */ 1 AS C1, CASE WHEN "T"."ID"&amp;gt;100000 THEN 1 ELSE 0 END AS C2 FROM ((SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(5) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(20) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(27) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(29) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(67) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(79) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(82) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(97) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(101) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(108) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(133) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(140) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(152) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(157) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(175) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(176) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(178) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(204) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(205) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(248) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(256) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(259) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(262) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(264) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(272) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(276) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(308) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(338) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(340) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(341) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(358) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(367) SAMPLE BLOCK (2.018927 , 1) SEED (1))) "T") SAMPLESUB&lt;br /&gt;*** 2012-01-09 10:01:33.100&lt;br /&gt;** Executed dynamic sampling query:&lt;br /&gt;    level : 5&lt;br /&gt;    sample pct. : 2.018927&lt;br /&gt;    total partitions : 384&lt;br /&gt;      partitions for sampling : 384&lt;br /&gt;      partitions actually sampled from : 32&lt;br /&gt;    actual sample size : 2063&lt;br /&gt;    filtered sample card. : 2003&lt;br /&gt;    orig. card. : 98028&lt;br /&gt;    block cnt. table stat. : 1585&lt;br /&gt;    block cnt. for sampling: 19020&lt;br /&gt;    partition subset block cnt. : 1585&lt;br /&gt;    max. sample block cnt. : 64&lt;br /&gt;    sample block cnt. : 32&lt;br /&gt;    min. sel. est. : 0.05000000&lt;br /&gt;** Using recursive dynamic sampling card. est. : 1226195.625000&lt;br /&gt;*** 2012-01-09 10:01:33.163&lt;br /&gt;** Generated dynamic sampling query:&lt;br /&gt;    query text : &lt;br /&gt;SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0), NVL(SUM(C3),0) FROM (SELECT /*+ NO_PARALLEL("T") INDEX("T" T_IDX) NO_PARALLEL_INDEX("T") */ 1 AS C1, 1 AS C2, 1 AS C3  FROM "T" "T" WHERE "T"."ID"&amp;gt;100000 AND ROWNUM &amp;lt;= 2500) SAMPLESUB&lt;br /&gt;*** 2012-01-09 10:01:33.179&lt;br /&gt;** Executed dynamic sampling query:&lt;br /&gt;    level : 5&lt;br /&gt;    sample pct. : 100.000000&lt;br /&gt;    total partitions : 384&lt;br /&gt;      partitions for sampling : 384&lt;br /&gt;    actual sample size : 1226196&lt;br /&gt;    filtered sample card. : 2500&lt;br /&gt;    filtered sample card. (index T_IDX): 2500&lt;br /&gt;    orig. card. : 1226196&lt;br /&gt;    block cnt. table stat. : 1585&lt;br /&gt;    block cnt. for sampling: 1585&lt;br /&gt;    max. sample block cnt. : 4294967295&lt;br /&gt;    sample block cnt. : 1585&lt;br /&gt;    min. sel. est. : 0.05000000&lt;br /&gt;** Increasing dynamic sampling selectivity&lt;br /&gt;   for predicate 0 from 0.002039 to 0.970916.&lt;br /&gt;** Increasing dynamic sampling selectivity&lt;br /&gt;   for predicate 1 from 0.002039 to 0.970916.&lt;br /&gt;    index T_IDX selectivity est.: 0.00203883&lt;br /&gt;** Using dynamic sampling card. : 1226196&lt;br /&gt;** Dynamic sampling updated table card.&lt;br /&gt;** Using single table dynamic sel. est. : 0.00203883&lt;br /&gt;  Table:  T  Alias: T     &lt;br /&gt;    Card: Original: 1226196  Rounded: 2500  Computed: 2500.00  Non Adjusted: 2500.00&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;-------------------------------------------------------+-----------------------------------+---------------+&lt;br /&gt;| Id  | Operation                            | Name    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |&lt;br /&gt;-------------------------------------------------------+-----------------------------------+---------------+&lt;br /&gt;| 0   | SELECT STATEMENT                     |         |       |       |    55 |           |       |       |&lt;br /&gt;| 1   |  PARTITION RANGE ALL                 |         |  2500 |  222K |    55 |  00:00:01 | 1     | 12    |&lt;br /&gt;| 2   |   PARTITION HASH ALL                 |         |  2500 |  222K |    55 |  00:00:01 | 1     | 32    |&lt;br /&gt;| 3   |    TABLE ACCESS BY LOCAL INDEX ROWID | T       |  2500 |  222K |    55 |  00:00:01 | 1     | 384   |&lt;br /&gt;| 4   |     INDEX RANGE SCAN                 | T_IDX   |  2500 |       |    20 |  00:00:01 | 1     | 384   |&lt;br /&gt;-------------------------------------------------------+-----------------------------------+---------------+&lt;br /&gt;Predicate Information:&lt;br /&gt;----------------------&lt;br /&gt;4 - access("ID"&amp;gt;100000)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Again it can be seen from these lines:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;** Increasing dynamic sampling selectivity&lt;br /&gt;   for predicate 0 from 0.002039 to 0.970916.&lt;br /&gt;** Increasing dynamic sampling selectivity&lt;br /&gt;   for predicate 1 from 0.002039 to 0.970916.&lt;br /&gt;    index T_IDX selectivity est.: 0.00203883&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;that in principle the selectivity estimate from the table level operation is supposed to be used but finally the wrong selectivity gets copied over which is then echoed by the final execution plan.&lt;br /&gt;&lt;br /&gt;This bug is tracked with bug "6408301: Bad cardinality estimate from dynamic sampling for indexes on partitioned table" and patches are available. The issue is fixed in 11.2.0.2, but the "wrong number of table blocks" issue is only fixed in 11.2.0.3. I don't have a bug number at hand for that bug, though.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you plan to use Dynamic Sampling on partitioned objects with many partitions where the number of partitions to sample cannot be significantly limited by partition pruning the result of Dynamic Sampling might be questionable for lower levels.&lt;br /&gt;&lt;br /&gt;In addition there is a bug that leads to wrong cost estimates for a full segment scan operation that is only fixed in the most recent releases.&lt;br /&gt;&lt;br /&gt;It probably makes sense to use higher Dynamic Sampling levels in such cases - the side effect of this is not only more reasonable sampling results but it might also allow to avoid the mentioned bug if the number of blocks sampled is greater than the number of partitions to sample.&lt;br /&gt;&lt;br /&gt;Be aware of the case where an index can be used by Dynamic Sampling in addition - for partitioned objects a bug might lead to dramatic underestimates of the cardinality.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Testcase Script&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The issues described here can easily reproduced by using the following simple test case:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;drop table t;&lt;br /&gt;&lt;br /&gt;purge table t;&lt;br /&gt;&lt;br /&gt;create table t&lt;br /&gt;partition by range (pkey)&lt;br /&gt;subpartition by hash (hash_id) subpartitions 32&lt;br /&gt;(&lt;br /&gt;  partition pkey_1 values less than (2)&lt;br /&gt;, partition pkey_2 values less than (3)&lt;br /&gt;, partition pkey_3 values less than (4)&lt;br /&gt;, partition pkey_4 values less than (5)&lt;br /&gt;, partition pkey_5 values less than (6)&lt;br /&gt;, partition pkey_6 values less than (7)&lt;br /&gt;, partition pkey_7 values less than (8)&lt;br /&gt;, partition pkey_8 values less than (9)&lt;br /&gt;, partition pkey_9 values less than (10)&lt;br /&gt;, partition pkey_10 values less than (11)&lt;br /&gt;, partition pkey_11 values less than (12)&lt;br /&gt;, partition pkey_12 values less than (13)&lt;br /&gt;)&lt;br /&gt;storage (initial 64k)&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;      , mod(rownum, 12) + 1 as pkey&lt;br /&gt;      --, 12 as pkey&lt;br /&gt;      --, 1 as hash_id&lt;br /&gt;      , rownum as hash_id&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 1000000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set echo on time on&lt;br /&gt;&lt;br /&gt;alter session set optimizer_dynamic_sampling = 5;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'composite_part_dyn_samp';&lt;br /&gt;&lt;br /&gt;alter session set events '10053 trace name context forever, level 1';&lt;br /&gt;&lt;br /&gt;explain plan&lt;br /&gt;for&lt;br /&gt;select * from t&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;explain plan&lt;br /&gt;for&lt;br /&gt;select /*+ dynamic_sampling(t 5) */ * from t&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'composite_part_dyn_samp_where';&lt;br /&gt;&lt;br /&gt;explain plan&lt;br /&gt;for&lt;br /&gt;select /*+ dynamic_sampling(t 5) */ * from t&lt;br /&gt;where id &amp;gt; 100000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'dummy';&lt;br /&gt;&lt;br /&gt;create index t_idx on t (id) global;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'composite_part_dyn_samp_index';&lt;br /&gt;&lt;br /&gt;explain plan&lt;br /&gt;for&lt;br /&gt;select /*+ dynamic_sampling(t 5) */ * from t&lt;br /&gt;where id &amp;gt; 100000&lt;br /&gt;;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-7067380006645715334?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/7067380006645715334/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=7067380006645715334' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/7067380006645715334'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/7067380006645715334'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2012/01/dynamic-sampling-on-multiple-partitions.html' title='Dynamic Sampling On Multiple Partitions - Bugs'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-1787988376923164386</id><published>2011-12-23T00:54:00.005+01:00</published><updated>2011-12-23T09:57:28.781+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Rowsource Profiling'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='DBMS_XPLAN'/><category scheme='http://www.blogger.com/atom/ns#' term='performance tuning'/><category scheme='http://www.blogger.com/atom/ns#' term='10gR2'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL statement analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Cool Stuff'/><title type='text'>Extended DISPLAY_CURSOR With Rowsource Statistics</title><content type='html'>&lt;span style="font-weight:bold;"&gt;Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So this will be my Oracle related Christmas present for you: A prototype implementation that extends the DBMS_XPLAN.DISPLAY_CURSOR output making it hopefully more meaningful and easier to interpret. It is a simple standalone SQL*Plus script with the main functionality performed by a single SQL query. I've demoed this also during my recent "optimizer hacking sessions".&lt;br /&gt;&lt;br /&gt;DBMS_XPLAN.DISPLAY_CURSOR together with the Rowsource Statistics feature (enabled via SQL_TRACE, GATHER_PLAN_STATISTICS hint, STATISTICS_LEVEL set to ALL or controlled via the corresponding hidden parameters "_rowsource_execution_statistics" and "_rowsource_statistics_sampfreq") allows since Oracle 10g a sophisticated analysis of the work performed by a single SQL statement.&lt;br /&gt;&lt;br /&gt;Of course you'll appreciate that it doesn't go as far as the Real-Time SQL Monitoring feature added in Oracle 11g but only available with Enterprise Edition + Diagnostic + Tuning Pack that is "always on" and provides similar (and much more) information while a statement is executing and doesn't require reproducing the execution with the corresponding hints / parameters set. &lt;br /&gt;&lt;br /&gt;It's usually necessary to reproduce the execution without the Tuning Pack because the overhead of the Rowsource Statistics is significant and therefore it doesn't make sense to have them always enabled - unfortunately Oracle 11g gathers the same information "always on" but you're only allowed to access that information if you have the Tuning Pack license.&lt;br /&gt;&lt;br /&gt;But for users without the corresponding licenses DBMS_XPLAN.DISPLAY_CURSOR together with Rowsource Statistics is still a very valuable tool.&lt;br /&gt;&lt;br /&gt;However during my seminars and consulting at client sites I've realized that people quite often struggle to interpret the output provided for several reasons:&lt;br /&gt;&lt;br /&gt;1. They have problems in general to interpret the execution plan - here I refer in particular to the flow of execution and underlying execution mechanics&lt;br /&gt;&lt;br /&gt;2. They have problems in identifying the operations that are responsible for the majority of the work due to the cumulative nature of the work-related figures provided like Elapsed Time, Logical I/O, Physical Reads etc.&lt;br /&gt;&lt;br /&gt;3. They are potentially mislead when trying to identify those steps in the execution plan that are subject to cardinality mis-estimates of the optimizer - the single most common reason for inefficient execution plans - due to the way the optimizer shows the number of estimated rows for operations that are executed multiple times (for example the inner row source of a Nested Loop join).&lt;br /&gt;&lt;br /&gt;I've tried to address all of the above points (and even more) with this prototype implementation. In fact point 1 above has already been addressed by &lt;a href="http://www.oracle-developer.net/utilities.php"&gt;Adrian Billington's&lt;/a&gt; &lt;a href="http://www.oracle-developer.net/content/utilities/xplan.zip"&gt;XPLAN wrapper&lt;/a&gt; utility that adds the Parent ID and Order of Execution to the DBMS_XPLAN output and I've picked up that idea of injecting additional information into the output with this prototype, so kudos to Adrian for his great idea and implementation.&lt;br /&gt;&lt;br /&gt;Apart from any home-grown scripts there have probably been numerous attempts to address point 2 und 3 above, the latest one I know of being Kyle Hailey's &lt;a href="http://dboptimizer.com/2011/09/20/display_cursor/"&gt;DISPLAY_CURSOR post&lt;/a&gt; and his "TCF query" provided in the same article. I've included his TCF-GRAPH and LIO-RATIO information, so also thanks to Kyle for posting this.&lt;br /&gt;&lt;br /&gt;I plan to eventually turn this into a SQL statement analysis "Swiss-army knife" for non-Tuning Pack users with more sophisticated formatting options (for example specifying which columns to show and in which order) and the ability to combine the information with the ASH info available from the Diagnostic Pack license (similar to the output provided by the Real-Time SQL Monitoring text mode).&lt;br /&gt;&lt;br /&gt;However I believe that this prototype is already quite helpful and therefore decided to publish it as it is.&lt;br /&gt;&lt;br /&gt;Let's have a look what the extended output has to offer by performing a couple of sample Rowsource Profiles.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Examples&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first example is deliberately kept as simple as possible to explain the basic functionality by performing a full table scan.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&amp;gt; alter session set statistics_level = all;&lt;br /&gt;&lt;br /&gt;Session altered.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.00&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select count(*) from t1;&lt;br /&gt;&lt;br /&gt;  COUNT(*)&lt;br /&gt;----------&lt;br /&gt;   1000000&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:03.01&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; @xplan_extended_display_cursor&lt;br /&gt;SQL&amp;gt; set echo off verify off termout off&lt;br /&gt;SQL_ID  5bc0v4my7dvr5, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select count(*) from t1&lt;br /&gt;&lt;br /&gt;Plan hash value: 3724264953&lt;br /&gt;&lt;br /&gt;-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Pid | Ord | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|&lt;br /&gt;-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 |     |   3 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:02.98 |   15390 |  15386 | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|   1 |   0 |   2 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:02.98 |   15390 |  15386 | 00:00:01.39 |        0 |        0 | @@@@@@      |             |             |        0 |          |        1 |&lt;br /&gt;|   2 |   1 |   1 |   TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:01.58 |   15390 |  15386 | 00:00:01.58 |    15390 |    15386 | @@@@@@      | @@@@@@@@@@@@| @@@@@@@@@@@@|        0 |          |     1000K|&lt;br /&gt;-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;14 rows selected.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The first thing that becomes obvious is the fact that you need a veeery wide display setting to see all the columns provided :-)&lt;br /&gt;&lt;br /&gt;As you can see if you call the script without any parameters it will try to pick up the last statement executed by the current session and call DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST formatting option. Further options can be found in the documentation provided with the script.&lt;br /&gt;&lt;br /&gt;To the left you can see the "Pid" and "Ord" column that Adrian added in his original XPLAN wrapper script - these define the Parent Id as well as the Order of Execution. Note that this Order of Execution is only correct for the common cases - it doesn't cater for the various exceptions to the general rules and therefore can be misleading. You'll find below an example that demonstrates this.&lt;br /&gt;&lt;br /&gt;Furthermore you see in addition the following columns to what is provided out of the box by DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST option:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;A-Time Self&lt;/span&gt;: This is the time spent on the operation itself. For leaf operations this corresponds to the A-Time, but for all non-leaf operations this is the time that was spent on the operation itself obtained by subtracting the time spent on all direct descendant operations from the time shown for the parent operation. Please note that if you use a lower rowsource sample frequency (for example as set by the GATHER_PLAN_STATISTICS hint) the A-Time information will be pretty wrong and misleading. You need to set the sample frequency to 1 to get a stable time information reported - of course this means that the overhead of the rowsource sampling gets maximized&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Bufs Self/Reads Self/Write Self&lt;/span&gt;: This is the corresponding self-operation statistic obtaining in the same way as just described&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Graphs&lt;/span&gt;: The self-operation work shown relative to the total work performed. Note that the "total" is defined by querying the MAX value found in the statistics rather than picking the top-most cumulative value. This is because for queries that are cancelled or performed using Parallel Execution the top-most value may either not be populated at all or may be different from the values accumulated by the Parallel Slaves. So there are cases where the Graphs may be wrong and misleading - treat them carefully&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;LIO Ratio&lt;/span&gt;: This the simply the ratio between the number of rows generated by the row source and the number of logical I/O recorded for the particular operation required to generate them. As usual care should be taken when interpreting a ratio, but in general a high value here might indicate that there are more efficient ways to generate the data, like a more selective access path. This can be very misleading for aggregation operations for example - a COUNT(*) will potentially show a huge LIO ratio but doesn't indicate a problem by itself&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;TCF Graph&lt;/span&gt;: "Tuning by Cardinality Feedback" - this is a graph in a different style - it shows either plus or minus signs and each plus / minus corresponds to one order of magnitude difference between the estimated and the actual rows. Plus stands for underestimates, minus for overestimates. So two plus signs indicate that the the actual number of rows was 100 times greater than the estimated number, and similarly two minus signs would indicate an overestimate of factor 100. Note that this information will be partially misleading with Parallel Execution, because an operation that is only started once with serial execution might be started several times to obtain the complete result set when executed in parallel. Also cancelling queries might show misleading information here, see the "E-Rows*Sta" column description for an explanation why.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;E-Rows*Sta&lt;/span&gt;: This is the estimated number of rows times the Starts column. This tries to address the point 3 above where the simple comparison of E-Rows and A-Rows can be very misleading, however doesn't indicate a problem at all if the operation has been started a corresponding number of times. If a query gets cancelled then this might still indicate a difference between this and A-Rows simply because the operation wasn't run to completion. Also for Parallel Execution this information needs to be carefully treated because an operation executed in parallel will be started many times that gets only executed once with serial execution&lt;br /&gt;&lt;br /&gt;Looking at above example graph it becomes obvious that all of the logical and physical I/O has been caused by the full table scan of course, but with the increased STATISTICS_LEVEL setting you can see that the SORT AGGREGATE function also required some time - presumably CPU time due to instrumentation overhead whereas the top-most operation didn't account for any work at all. The cardinality estimate is also spot on.&lt;br /&gt;&lt;br /&gt;The next example shows a different shape:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&amp;gt; select&lt;br /&gt;  2          sum(row_num)&lt;br /&gt;  3  from&lt;br /&gt;  4          (&lt;br /&gt;  5          select&lt;br /&gt;  6                  row_number() over (partition by object_type order by object_name) as row_num&lt;br /&gt;  7                , t.*&lt;br /&gt;  8          from&lt;br /&gt;  9                  t&lt;br /&gt; 10          where&lt;br /&gt; 11                  object_id &amp;gt; :x&lt;br /&gt; 12          );&lt;br /&gt;  2.7338E+11&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:41.83&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; @xplan_extended_display_cursor&lt;br /&gt;SQL&amp;gt; set echo off verify off termout off&lt;br /&gt;SQL_ID  fmbq5ytmh0hng, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select         sum(row_num) from         (         select&lt;br /&gt;  row_number() over (partition by object_type order by object_name) as&lt;br /&gt;row_num               , t.*         from                 t&lt;br /&gt;where                 object_id &amp;gt; :x         )&lt;br /&gt;&lt;br /&gt;Plan hash value: 1399240396&lt;br /&gt;&lt;br /&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Pid | Ord | Operation                      | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp| A-Time Self |Bufs Self |Reads Self|Write Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|Write S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|&lt;br /&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 |     |   6 | SELECT STATEMENT               |      |      1 |        |      1 |00:00:41.84 |    1469K|  49356 |  10578 |       |       |          |         | 00:00:00.00 |        0 |        0 |        0 |             |             |             |             |        0 |          |          |&lt;br /&gt;|   1 |   0 |   5 |  SORT AGGREGATE                |      |      1 |      1 |      1 |00:00:41.84 |    1469K|  49356 |  10578 |       |       |          |         | 00:00:02.19 |        0 |        0 |        0 | @           |             |             |             |        0 |          |        1 |&lt;br /&gt;|   2 |   1 |   4 |   VIEW                         |      |      1 |     16 |   1466K|00:00:39.64 |    1469K|  49356 |  10578 |       |       |          |         | 00:00:04.22 |        0 |        0 |        0 | @           |             |             |             |        0 | ++++     |       16 |&lt;br /&gt;|   3 |   2 |   3 |    WINDOW SORT                 |      |      1 |     16 |   1466K|00:00:35.42 |    1469K|  49356 |  10578 |    93M|  3312K|   55M (1)|   84992 | 00:00:11.85 |        6 |    24410 |    10578 | @@@         |             | @@@@@@      | @@@@@@@@@@@@|        0 | ++++     |       16 |&lt;br /&gt;|   4 |   3 |   2 |     TABLE ACCESS BY INDEX ROWID| T    |      1 |     16 |   1466K|00:00:23.58 |    1469K|  24946 |      0 |       |       |          |         | 00:00:19.73 |     1466K|    21707 |        0 | @@@@@@      | @@@@@@@@@@@@| @@@@@       |             |        1 | ++++     |       16 |&lt;br /&gt;|*  5 |   4 |   1 |      INDEX RANGE SCAN          | I    |      1 |     16 |   1466K|00:00:03.85 |    3240 |   3239 |      0 |       |       |          |         | 00:00:03.85 |     3240 |     3239 |        0 | @           |             | @           |             |        0 | ++++     |       16 |&lt;br /&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   5 - access("OBJECT_ID"&amp;gt;:X)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;25 rows selected.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Have I already mentioned that you need a veery wide display setting :-) ??&lt;br /&gt;&lt;br /&gt;Anyway here we can see a couple of interesting points:&lt;br /&gt;&lt;br /&gt;- An example of a parent operation requiring a significant amount of time - in this case a WINDOW SORT operation that spills to disk (see the Used-Tmp and Writes columns)&lt;br /&gt;&lt;br /&gt;- A problem with the cardinality estimates as indicated by the TCF Graph. In this case it is the reason for an inefficient index-based access path. Note that the LIO Ratio isn't indicating this problem here very clearly&lt;br /&gt;&lt;br /&gt;- The majority of the logical I/O (and time and work) is caused by the random access to the table, again caused by the bad choice of the optimizer due to the wrong cardinality estimates&lt;br /&gt;&lt;br /&gt;Here is another example of a more complex execution plan:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&amp;gt; alter session set statistics_level = all;&lt;br /&gt;&lt;br /&gt;Session altered.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.00&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; alter session set star_transformation_enabled = temp_disable;&lt;br /&gt;&lt;br /&gt;Session altered.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.00&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select * from (&lt;br /&gt;  2  select t1.id as t1_id, t1.filler, s.id as s_id from t1, (&lt;br /&gt;  3  select&lt;br /&gt;  4         f.id&lt;br /&gt;  5  from&lt;br /&gt;  6         t f&lt;br /&gt;  7       , (select * from d where is_flag_d1 = 'Y') d1&lt;br /&gt;  8       , (select * from d where is_flag_d2 = 'Y') d2&lt;br /&gt;  9       , (select * from d where is_flag_d3 = 'Y') d3&lt;br /&gt; 10  where&lt;br /&gt; 11         f.fk1 = d1.id&lt;br /&gt; 12  and    f.fk2 = d2.id&lt;br /&gt; 13  and    f.fk3 = d3.id&lt;br /&gt; 14  ) s&lt;br /&gt; 15  where t1.id = s.id&lt;br /&gt; 16  )&lt;br /&gt; 17  where rownum &amp;gt; 1&lt;br /&gt; 18  ;&lt;br /&gt;&lt;br /&gt;no rows selected&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:21.26&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; @xplan_extended_display_cursor&lt;br /&gt;SQL&amp;gt; set echo off verify off termout off&lt;br /&gt;SQL_ID  5u3x96k4s5zt6, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select * from ( select t1.id as t1_id, t1.filler, s.id as s_id from t1,&lt;br /&gt;( select        f.id from        t f      , (select * from d where&lt;br /&gt;is_flag_d1 = 'Y') d1      , (select * from d where is_flag_d2 = 'Y') d2&lt;br /&gt;     , (select * from d where is_flag_d3 = 'Y') d3 where        f.fk1 =&lt;br /&gt;d1.id and    f.fk2 = d2.id and    f.fk3 = d3.id ) s where t1.id = s.id&lt;br /&gt;) where rownum &amp;gt; 1&lt;br /&gt;&lt;br /&gt;Plan hash value: 42027304&lt;br /&gt;&lt;br /&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Pid | Ord | Operation                            | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|&lt;br /&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 |     |  30 | SELECT STATEMENT                     |                |      1 |        |      0 |00:00:21.23 |    2161K|  43798 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |          |          |          |&lt;br /&gt;|   1 |   0 |  29 |  COUNT                               |                |      1 |        |      0 |00:00:21.23 |    2161K|  43798 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |          |          |          |&lt;br /&gt;|*  2 |   1 |  28 |   FILTER                             |                |      1 |        |      0 |00:00:21.23 |    2161K|  43798 |       |       |          | 00:00:00.44 |        0 |        0 |             |             |             |          |          |          |&lt;br /&gt;|   3 |   2 |  27 |    NESTED LOOPS                      |                |      1 |        |   1000K|00:00:20.79 |    2161K|  43798 |       |       |          | 00:00:00.44 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|   4 |   3 |  25 |     NESTED LOOPS                     |                |      1 |      9 |   1000K|00:00:11.10 |    2131K|  21150 |       |       |          | 00:00:00.41 |        0 |        0 |             |             |             |        0 | +++++    |        9 |&lt;br /&gt;|*  5 |   4 |  23 |      HASH JOIN                       |                |      1 |      9 |   1000K|00:00:06.12 |   19549 |  17970 |    33M|  6589K|   65M (0)| 00:00:00.59 |        0 |        0 |             |             |             |        0 | +++++    |        9 |&lt;br /&gt;|*  6 |   5 |  21 |       HASH JOIN                      |                |      1 |      9 |   1000K|00:00:05.53 |   19385 |  17970 |    37M|  6044K|   69M (0)| 00:00:00.57 |        0 |        0 |             |             |             |        0 | +++++    |        9 |&lt;br /&gt;|*  7 |   6 |  19 |        HASH JOIN                     |                |      1 |     10 |   1000K|00:00:04.95 |   19221 |  17970 |  1452K|  1452K| 1002K (0)| 00:00:00.53 |        0 |        0 |             |             |             |        0 | +++++    |       10 |&lt;br /&gt;|*  8 |   7 |   1 |         TABLE ACCESS FULL            | D              |      1 |     10 |     10 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |       16 |          |       10 |&lt;br /&gt;|   9 |   7 |  18 |         VIEW                         | VW_ST_84A34AF1 |      1 |     10 |   1000K|00:00:04.42 |   19057 |  17970 |       |       |          | 00:00:00.18 |        0 |        0 |             |             |             |        0 | +++++    |       10 |&lt;br /&gt;|  10 |   9 |  17 |          NESTED LOOPS                |                |      1 |     10 |   1000K|00:00:04.24 |   19057 |  17970 |       |       |          | 00:00:00.37 |        0 |        0 |             |             |             |        0 | +++++    |       10 |&lt;br /&gt;|  11 |  10 |  15 |           BITMAP CONVERSION TO ROWIDS|                |      1 |     10 |   1000K|00:00:00.41 |    2107 |   1020 |       |       |          | 00:00:00.11 |        0 |        0 |             |             |             |        0 | +++++    |       10 |&lt;br /&gt;|  12 |  11 |  14 |            BITMAP AND                |                |      1 |        |     11 |00:00:00.30 |    2107 |   1020 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|  13 |  12 |   5 |             BITMAP MERGE             |                |      1 |        |     11 |00:00:00.12 |     863 |    400 |  1024K|   512K| 2804K (0)| 00:00:00.02 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|  14 |  13 |   4 |              BITMAP KEY ITERATION    |                |      1 |        |    800 |00:00:00.10 |     863 |    400 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|* 15 |  14 |   2 |               TABLE ACCESS FULL      | D              |      1 |    100 |    100 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |        1 |          |      100 |&lt;br /&gt;|* 16 |  14 |   3 |               BITMAP INDEX RANGE SCAN| T_FK1          |    100 |        |    800 |00:00:00.10 |     699 |    400 |       |       |          | 00:00:00.10 |      699 |      400 |             |             |             |        0 |          |          |&lt;br /&gt;|  17 |  12 |   9 |             BITMAP MERGE             |                |      1 |        |     11 |00:00:00.12 |     847 |    400 |  2802K|   512K| 2804K (0)| 00:00:00.02 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|  18 |  17 |   8 |              BITMAP KEY ITERATION    |                |      1 |        |    800 |00:00:00.10 |     847 |    400 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|* 19 |  18 |   6 |               TABLE ACCESS FULL      | D              |      1 |    100 |    100 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |        1 |          |      100 |&lt;br /&gt;|* 20 |  18 |   7 |               BITMAP INDEX RANGE SCAN| T_FK3          |    100 |        |    800 |00:00:00.10 |     683 |    400 |       |       |          | 00:00:00.10 |      683 |      400 |             |             |             |        0 |          |          |&lt;br /&gt;|  21 |  12 |  13 |             BITMAP MERGE             |                |      1 |        |     11 |00:00:00.06 |     397 |    220 |  1024K|   512K| 1581K (0)| 00:00:00.01 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|  22 |  21 |  12 |              BITMAP KEY ITERATION    |                |      1 |        |    440 |00:00:00.05 |     397 |    220 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|* 23 |  22 |  10 |               TABLE ACCESS FULL      | D              |      1 |     10 |     10 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |       16 |          |       10 |&lt;br /&gt;|* 24 |  22 |  11 |               BITMAP INDEX RANGE SCAN| T_FK2          |     10 |        |    440 |00:00:00.05 |     233 |    220 |       |       |          | 00:00:00.05 |      233 |      220 |             |             |             |        0 |          |          |&lt;br /&gt;|  25 |  10 |  16 |           TABLE ACCESS BY USER ROWID | T              |   1000K|      1 |   1000K|00:00:03.46 |   16950 |  16950 |       |       |          | 00:00:03.46 |    16950 |    16950 | @@          |             | @@@@@       |        0 |          |     1000K|&lt;br /&gt;|* 26 |   6 |  20 |        TABLE ACCESS FULL             | D              |      1 |    100 |    100 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |        1 |          |      100 |&lt;br /&gt;|* 27 |   5 |  22 |       TABLE ACCESS FULL              | D              |      1 |    100 |    100 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |        1 |          |      100 |&lt;br /&gt;|* 28 |   4 |  24 |      INDEX RANGE SCAN                | T1_IDX         |   1000K|      1 |   1000K|00:00:04.57 |    2111K|   3180 |       |       |          | 00:00:04.57 |     2112K|     3180 | @@@         | @@@@@@@@@@@@| @           |        2 |          |     1000K|&lt;br /&gt;|  29 |   3 |  26 |     TABLE ACCESS BY INDEX ROWID      | T1             |   1000K|      1 |   1000K|00:00:09.25 |   29628 |  22648 |       |       |          | 00:00:09.25 |    29628 |    22648 | @@@@@       |             | @@@@@@      |        0 |          |     1000K|&lt;br /&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&amp;gt;1)&lt;br /&gt;   5 - access("ITEM_1"="D"."ID")&lt;br /&gt;   6 - access("ITEM_3"="D"."ID")&lt;br /&gt;   7 - access("ITEM_2"="D"."ID")&lt;br /&gt;   8 - filter("IS_FLAG_D2"='Y')&lt;br /&gt;  15 - filter("IS_FLAG_D1"='Y')&lt;br /&gt;  16 - access("F"."FK1"="D"."ID")&lt;br /&gt;  19 - filter("IS_FLAG_D3"='Y')&lt;br /&gt;  20 - access("F"."FK3"="D"."ID")&lt;br /&gt;  23 - filter("IS_FLAG_D2"='Y')&lt;br /&gt;  24 - access("F"."FK2"="D"."ID")&lt;br /&gt;  26 - filter("IS_FLAG_D1"='Y')&lt;br /&gt;  27 - filter("IS_FLAG_D3"='Y')&lt;br /&gt;  28 - access("T1"."ID"="ITEM_4")&lt;br /&gt;&lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - star transformation used for this statement&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;68 rows selected.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This is another case where a more efficient execution plan could be found if the cardinality estimate was in the right ballpark - you can see this pretty clearly in the "TCF Graph" column. Due to the strong underestimation several bad choices have been made: Reading all rows from T by ROWID rather than performing simply a full table scan and again an index driven random access to T1 which drives up the logical I/O unnecessarily. This is a crafted example that minimizes the logical and physical I/O due to the good clustering of T1 in relation to the data returned by the driving row source - a more real-life bad clustering together with larger table sizes would have turned this into a more or less infinitely running query.&lt;br /&gt;&lt;br /&gt;It is also an example that simply looking at E-Rows and A-Rows can be misleading: Check operations 28 and 29: A-Rows is 1000K but E-Rows is 1, so should this be worrying? Not at all if you look at the "E-Rows*Sta" column because the operation has been started 1000K times hence the estimate is spot on.&lt;br /&gt;&lt;br /&gt;The "LIO Ratio" for operation 23 is 16 - this means it took 16 LIOs on average to generate a single row and might indicate that there are more efficient ways to generate those rows than a full table scan.&lt;br /&gt;&lt;br /&gt;By the way, the &lt;a href="http://oracle-randolf.blogspot.com/2011/08/logical-io-evolution-part-3-11g.html"&gt;11g buffer pinning optimization&lt;/a&gt; also helped to minimize the logical I/O on the T1 table.&lt;br /&gt;&lt;br /&gt;Here is the same query, but this time with a bad clustering of T1 - I've cancelled it after 40 seconds to give you an example of that you can use DBMS_XPLAN.DISPLAY_CURSOR without the need to run a statement for completion.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&amp;gt; select * from (&lt;br /&gt;  2  select t1.id as t1_id, t1.filler, s.id as s_id from t1, (&lt;br /&gt;  3  select&lt;br /&gt;  4         f.id&lt;br /&gt;  5  from&lt;br /&gt;  6         t f&lt;br /&gt;  7       , (select * from d where is_flag_d1 = 'Y') d1&lt;br /&gt;  8       , (select * from d where is_flag_d2 = 'Y') d2&lt;br /&gt;  9       , (select * from d where is_flag_d3 = 'Y') d3&lt;br /&gt; 10  where&lt;br /&gt; 11         f.fk1 = d1.id&lt;br /&gt; 12  and    f.fk2 = d2.id&lt;br /&gt; 13  and    f.fk3 = d3.id&lt;br /&gt; 14  ) s&lt;br /&gt; 15  where t1.id = s.id&lt;br /&gt; 16  )&lt;br /&gt; 17  where rownum &amp;gt; 1&lt;br /&gt; 18  ;&lt;br /&gt;select t1.id as t1_id, t1.filler, s.id as s_id from t1, (&lt;br /&gt;                                                    *&lt;br /&gt;ERROR at line 2:&lt;br /&gt;ORA-01013: user requested cancel of current operation&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:40.71&lt;br /&gt;SQL&amp;gt; @xplan_extended_display_cursor&lt;br /&gt;SQL_ID  5u3x96k4s5zt6, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select * from ( select t1.id as t1_id, t1.filler, s.id as s_id from t1,&lt;br /&gt;( select        f.id from        t f      , (select * from d where&lt;br /&gt;is_flag_d1 = 'Y') d1      , (select * from d where is_flag_d2 = 'Y') d2&lt;br /&gt;     , (select * from d where is_flag_d3 = 'Y') d3 where        f.fk1 =&lt;br /&gt;d1.id and    f.fk2 = d2.id and    f.fk3 = d3.id ) s where t1.id = s.id&lt;br /&gt;) where rownum &amp;gt; 1&lt;br /&gt;&lt;br /&gt;Plan hash value: 42027304&lt;br /&gt;&lt;br /&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Pid | Ord | Operation                            | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|&lt;br /&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 |     |  30 | SELECT STATEMENT                     |                |      1 |        |      0 |00:00:00.01 |       0 |      0 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |          |          |          |&lt;br /&gt;|   1 |   0 |  29 |  COUNT                               |                |      1 |        |      0 |00:00:00.01 |       0 |      0 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |          |          |          |&lt;br /&gt;|*  2 |   1 |  28 |   FILTER                             |                |      1 |        |      0 |00:00:00.01 |       0 |      0 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |          |          |          |&lt;br /&gt;|   3 |   2 |  27 |    NESTED LOOPS                      |                |      1 |        |    102K|00:00:40.42 |     337K|    143K|       |       |          | 00:00:00.16 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|   4 |   3 |  25 |     NESTED LOOPS                     |                |      1 |      9 |    102K|00:00:13.08 |     235K|  40794 |       |       |          | 00:00:00.13 |        0 |        0 |             |             |             |        0 | ++++     |        9 |&lt;br /&gt;|*  5 |   4 |  23 |      HASH JOIN                       |                |      1 |      9 |    102K|00:00:05.70 |   19420 |  17970 |    33M|  6589K|   65M (0)| 00:00:00.40 |        0 |        0 |             |             |             |        0 | ++++     |        9 |&lt;br /&gt;|*  6 |   5 |  21 |       HASH JOIN                      |                |      1 |      9 |   1000K|00:00:05.30 |   19385 |  17970 |    37M|  6044K|   69M (0)| 00:00:00.57 |        0 |        0 |             |             |             |        0 | +++++    |        9 |&lt;br /&gt;|*  7 |   6 |  19 |        HASH JOIN                     |                |      1 |     10 |   1000K|00:00:04.73 |   19221 |  17970 |  1452K|  1452K| 1010K (0)| 00:00:00.52 |        0 |        0 |             |             |             |        0 | +++++    |       10 |&lt;br /&gt;|*  8 |   7 |   1 |         TABLE ACCESS FULL            | D              |      1 |     10 |     10 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |       16 |          |       10 |&lt;br /&gt;|   9 |   7 |  18 |         VIEW                         | VW_ST_84A34AF1 |      1 |     10 |   1000K|00:00:04.21 |   19057 |  17970 |       |       |          | 00:00:00.18 |        0 |        0 |             |             |             |        0 | +++++    |       10 |&lt;br /&gt;|  10 |   9 |  17 |          NESTED LOOPS                |                |      1 |     10 |   1000K|00:00:04.03 |   19057 |  17970 |       |       |          | 00:00:00.37 |        0 |        0 |             |             |             |        0 | +++++    |       10 |&lt;br /&gt;|  11 |  10 |  15 |           BITMAP CONVERSION TO ROWIDS|                |      1 |     10 |   1000K|00:00:00.42 |    2107 |   1020 |       |       |          | 00:00:00.10 |        0 |        0 |             |             |             |        0 | +++++    |       10 |&lt;br /&gt;|  12 |  11 |  14 |            BITMAP AND                |                |      1 |        |     11 |00:00:00.32 |    2107 |   1020 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|  13 |  12 |   5 |             BITMAP MERGE             |                |      1 |        |     11 |00:00:00.13 |     863 |    400 |  1024K|   512K|          | 00:00:00.02 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|  14 |  13 |   4 |              BITMAP KEY ITERATION    |                |      1 |        |    800 |00:00:00.11 |     863 |    400 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|* 15 |  14 |   2 |               TABLE ACCESS FULL      | D              |      1 |    100 |    100 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |        1 |          |      100 |&lt;br /&gt;|* 16 |  14 |   3 |               BITMAP INDEX RANGE SCAN| T_FK1          |    100 |        |    800 |00:00:00.11 |     699 |    400 |       |       |          | 00:00:00.11 |      699 |      400 |             |             |             |        0 |          |          |&lt;br /&gt;|  17 |  12 |   9 |             BITMAP MERGE             |                |      1 |        |     11 |00:00:00.13 |     847 |    400 |  2802K|   512K| 2804K (0)| 00:00:00.02 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|  18 |  17 |   8 |              BITMAP KEY ITERATION    |                |      1 |        |    800 |00:00:00.11 |     847 |    400 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|* 19 |  18 |   6 |               TABLE ACCESS FULL      | D              |      1 |    100 |    100 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |        1 |          |      100 |&lt;br /&gt;|* 20 |  18 |   7 |               BITMAP INDEX RANGE SCAN| T_FK3          |    100 |        |    800 |00:00:00.11 |     683 |    400 |       |       |          | 00:00:00.11 |      683 |      400 |             |             |             |        0 |          |          |&lt;br /&gt;|  21 |  12 |  13 |             BITMAP MERGE             |                |      1 |        |     11 |00:00:00.06 |     397 |    220 |  1024K|   512K|          | 00:00:00.01 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|  22 |  21 |  12 |              BITMAP KEY ITERATION    |                |      1 |        |    440 |00:00:00.05 |     397 |    220 |       |       |          | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|* 23 |  22 |  10 |               TABLE ACCESS FULL      | D              |      1 |     10 |     10 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |       16 |          |       10 |&lt;br /&gt;|* 24 |  22 |  11 |               BITMAP INDEX RANGE SCAN| T_FK2          |     10 |        |    440 |00:00:00.05 |     233 |    220 |       |       |          | 00:00:00.05 |      233 |      220 |             |             |             |        0 |          |          |&lt;br /&gt;|  25 |  10 |  16 |           TABLE ACCESS BY USER ROWID | T              |   1000K|      1 |   1000K|00:00:03.24 |   16950 |  16950 |       |       |          | 00:00:03.24 |    16950 |    16950 | @           | @           | @           |        0 |          |     1000K|&lt;br /&gt;|* 26 |   6 |  20 |        TABLE ACCESS FULL             | D              |      1 |    100 |    100 |00:00:00.01 |     164 |      0 |       |       |          | 00:00:00.00 |      164 |        0 |             |             |             |        1 |          |      100 |&lt;br /&gt;|* 27 |   5 |  22 |       TABLE ACCESS FULL              | D              |      1 |    100 |     11 |00:00:00.01 |      35 |      0 |       |       |          | 00:00:00.00 |       35 |        0 |             |             |             |        3 |          |      100 |&lt;br /&gt;|* 28 |   4 |  24 |      INDEX RANGE SCAN                | T1_IDX         |    102K|      1 |    102K|00:00:07.26 |     216K|  22824 |       |       |          | 00:00:07.26 |      216K|    22824 | @@          | @@@@@@@@    | @@          |        2 |          |      102K|&lt;br /&gt;|  29 |   3 |  26 |     TABLE ACCESS BY INDEX ROWID      | T1             |    102K|      1 |    102K|00:00:27.19 |     102K|    102K|       |       |          | 00:00:27.19 |      102K|      102K| @@@@@@@@    | @@@@        | @@@@@@@@@   |        1 |          |      102K|&lt;br /&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&amp;gt;1)&lt;br /&gt;   5 - access("ITEM_1"="D"."ID")&lt;br /&gt;   6 - access("ITEM_3"="D"."ID")&lt;br /&gt;   7 - access("ITEM_2"="D"."ID")&lt;br /&gt;   8 - filter("IS_FLAG_D2"='Y')&lt;br /&gt;  15 - filter("IS_FLAG_D1"='Y')&lt;br /&gt;  16 - access("F"."FK1"="D"."ID")&lt;br /&gt;  19 - filter("IS_FLAG_D3"='Y')&lt;br /&gt;  20 - access("F"."FK3"="D"."ID")&lt;br /&gt;  23 - filter("IS_FLAG_D2"='Y')&lt;br /&gt;  24 - access("F"."FK2"="D"."ID")&lt;br /&gt;  26 - filter("IS_FLAG_D1"='Y')&lt;br /&gt;  27 - filter("IS_FLAG_D3"='Y')&lt;br /&gt;  28 - access("T1"."ID"="ITEM_4")&lt;br /&gt;&lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - star transformation used for this statement&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;68 rows selected.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The result is similar to the previous, but you can see that the increased number of physical reads on the T1 table segment slowed down the execution significantly.&lt;br /&gt;&lt;br /&gt;Here is an example of Parallel Execution. Note that I strongly recommend the Real-Time SQL Monitoring feature if you have to deal a lot with Parallel Execution, because it is offering much more insight and information than DBMS_XPLAN.DISPLAY_CURSOR.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&amp;gt; alter session set statistics_level = all;&lt;br /&gt;&lt;br /&gt;Session altered.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.01&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select /*+ parallel(t1, 4) */ count(*) from t1;&lt;br /&gt;&lt;br /&gt;  COUNT(*)&lt;br /&gt;----------&lt;br /&gt;   1000000&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:01.59&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; @xplan_extended_display_cursor "" "" ALLSTATS&lt;br /&gt;SQL&amp;gt; set echo off verify off termout off&lt;br /&gt;SQL_ID  92661sht5tyw1, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select /*+ parallel(t1, 4) */ count(*) from t1&lt;br /&gt;&lt;br /&gt;Plan hash value: 3110199320&lt;br /&gt;&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Pid | Ord | Operation              | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 |     |   7 | SELECT STATEMENT       |          |      1 |        |      1 |00:00:01.53 |       7 |      2 | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|   1 |   0 |   6 |  SORT AGGREGATE        |          |      1 |      1 |      1 |00:00:01.53 |       7 |      2 | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |        1 |&lt;br /&gt;|   2 |   1 |   5 |   PX COORDINATOR       |          |      1 |        |      4 |00:00:01.53 |       7 |      2 | 00:00:01.53 |        7 |        2 | @@@         |             |             |        1 |          |          |&lt;br /&gt;|   3 |   2 |   4 |    PX SEND QC (RANDOM) | :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |      0 | 00:00:00.00 |        0 |        0 |             |             |             |          |          |        0 |&lt;br /&gt;|   4 |   3 |   3 |     SORT AGGREGATE     |          |      4 |      1 |      4 |00:00:05.84 |   15541 |  15385 | 00:00:00.14 |        0 |     3841 |             |             | @@@         |        0 |          |        4 |&lt;br /&gt;|   5 |   4 |   2 |      PX BLOCK ITERATOR |          |      4 |   1000K|   1000K|00:00:05.71 |   15541 |  11544 | 00:00:00.27 |        0 |        0 | @           |             |             |        0 |          |     4000K|&lt;br /&gt;|*  6 |   5 |   1 |       TABLE ACCESS FULL| T1       |     52 |   1000K|   1000K|00:00:05.44 |   15541 |  15385 | 00:00:05.44 |    15541 |    15385 | @@@@@@@@@@@ | @@@@@@@@@@@@| @@@@@@@@@@@@|        0 | -        |       52M|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   6 - access(:Z&amp;gt;=:Z AND :Z&amp;lt;=:Z)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;23 rows selected.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The important points to consider when dealing with Parallel Execution are:&lt;br /&gt;&lt;br /&gt;- You need to use ALLSTATS instead of ALLSTATS LAST in order to get a meaningful output. ALLSTATS LAST would only show the activity of the Query Coordinator. However ALLSTATS means that it shows the statistics cumulative for all executions so far, so if the query has been executed multiple times this will show you not only the statistics of the last execution. If you want to ensure to see only the statistics for the last execution you need to create a new cursor, for example by adding a simple comment that makes the SQL statement text unique&lt;br /&gt;&lt;br /&gt;- You can see in the output that the "TCF Graph" and "E-Rows*Sta" columns can be misleading for Parallel Execution - the full table scan has been divided into 52 chunks executed by four parallel slaves, hence the Starts column shows 52 but the cardinality estimate of 1000K rows was spot on instead of wrong by factor 52&lt;br /&gt;&lt;br /&gt;- The elapsed time information for the parts executed in parallel is not the wall clock time but the accumulated time spent by all parallel slaves, hence the Graphs will be partially misleading due to the MAX value found&lt;br /&gt;&lt;br /&gt;- In this case also the "Reads Self" column seems to indicate reads by the SORT AGGREGATE operation - this looks questionable, too&lt;br /&gt;&lt;br /&gt;The last example shows that it takes just a simple scalar subquery to make the output misleading again - so be aware that are some exceptions (like scalar / early filter subqueries, certain Parallel Execution plans etc.) to the rules how to interpret execution plans and usually any automated interpretation of such plans is therefore mislead:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&amp;gt; select count(id) from (select (select id from t1 t1_i where t1_i.id = t1.id) as id from t1);&lt;br /&gt;&lt;br /&gt; COUNT(ID)&lt;br /&gt;----------&lt;br /&gt;   1000000&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:17.50&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; @xplan_extended_display_cursor&lt;br /&gt;SQL&amp;gt; set echo off verify off termout off&lt;br /&gt;SQL_ID  af2gry2z9g7vt, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select count(id) from (select (select id from t1 t1_i where t1_i.id =&lt;br /&gt;t1.id) as id from t1)&lt;br /&gt;&lt;br /&gt;Plan hash value: 1144741071&lt;br /&gt;&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Pid | Ord | Operation          | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 |     |   4 | SELECT STATEMENT   |        |      1 |        |      1 |00:00:17.51 |    1551K|  17617 | 00:00:00.00 |        0 |        0 |             |             |             |        0 |          |          |&lt;br /&gt;|*  1 |   0 |   1 |  INDEX RANGE SCAN  | T1_IDX |   1000K|      1 |   1000K|00:00:09.36 |    1536K|   2231 | 00:00:09.36 |     1536K|     2231 | @@@@@@      | @@@@@@@@@@@@| @@          |        1 |          |     1000K|&lt;br /&gt;|   2 |   0 |   3 |  SORT AGGREGATE    |        |      1 |      1 |      1 |00:00:17.51 |    1551K|  17617 | 00:00:15.76 |     1536K|     2231 | @@@@@@@@@@@ | @@@@@@@@@@@@| @@          |  1536485 |          |        1 |&lt;br /&gt;|   3 |   2 |   2 |   TABLE ACCESS FULL| T1     |      1 |   1000K|   1000K|00:00:01.74 |   15390 |  15386 | 00:00:01.74 |    15390 |    15386 | @           |             | @@@@@@@@@@  |        0 |          |     1000K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   1 - access("T1_I"."ID"=:B1)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The scalar subquery is shown as child operation to the root node (or in 10g with the missing ID = 0 operation in V$SQL_PLAN_STATISTICS(_ALL) as independent operation with no parent at all) and according to the usual rules therefore would be executed first (see the "Ord" column), but this is not true - the execution starts with the first leaf of the main branch of the plan (the full table scan of T1).&lt;br /&gt;&lt;br /&gt;Note that not only the "Ord" column is wrong, also the fact that the SORT AGGREGATE operation includes the work performed by the scalar subquery is not interpreted correctly by the remaining logic calculating the operation self statistics.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;The Script&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Below you can find the current version of the script. If you're too lazy to copy&amp;paste (and because I don't have a fancy "copy to clipboard" button) you can also download the script from &lt;a href="http://www.sqltools-plusplus.org:7676/media/xplan_extended_display_cursor.sql"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Of course I'm interested in feedback. This prototype is not tested much yet, so expect glitches and problems. If you get back to me with reproducible cases I'll try to address them and publish updated versions of the script.&lt;br /&gt;&lt;br /&gt;A final note: This tool comes for free but with no warranties at all. Use at your own risk.&lt;br /&gt;&lt;br /&gt;Happy rowsource profiling (and holiday season)!&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;set echo off verify off termout off&lt;br /&gt;set doc off&lt;br /&gt;doc&lt;br /&gt;-- ----------------------------------------------------------------------------------------------&lt;br /&gt;--&lt;br /&gt;-- Script:       xplan_extended_display_cursor.sql&lt;br /&gt;--&lt;br /&gt;-- Version:      0.9&lt;br /&gt;--               December 2011&lt;br /&gt;--&lt;br /&gt;-- Author:       Randolf Geist&lt;br /&gt;--               oracle-randolf.blogspot.com&lt;br /&gt;--&lt;br /&gt;-- Description:  A free-standing SQL wrapper over DBMS_XPLAN. Provides access to the&lt;br /&gt;--               DBMS_XPLAN.DISPLAY_CURSOR pipelined function for a given SQL_ID and CHILD_NUMBER&lt;br /&gt;--&lt;br /&gt;--               This is a prototype for an extended analysis of the data provided by the&lt;br /&gt;--               Runtime Profile (aka. Rowsource Statistics enabled via&lt;br /&gt;--               SQL_TRACE = TRUE, STATISTICS_LEVEL = ALL or GATHER_PLAN_STATISTICS hint)&lt;br /&gt;--               and reported via the ALLSTATS/MEMSTATS/IOSTATS formatting option of&lt;br /&gt;--               DBMS_XPLAN.DISPLAY_CURSOR&lt;br /&gt;--&lt;br /&gt;-- Versions:     This utility will work for all versions of 10g and upwards.&lt;br /&gt;--&lt;br /&gt;-- Required:     The same access as DBMS_XPLAN.DISPLAY_CURSOR requires. See the documentation&lt;br /&gt;--               of DISPLAY_CURSOR for your Oracle version for more information&lt;br /&gt;--&lt;br /&gt;--               The script directly queries&lt;br /&gt;--               1) V$SESSION&lt;br /&gt;--               2) V$SQL_PLAN_STATISTICS_ALL&lt;br /&gt;--&lt;br /&gt;-- Credits:      Based on the original XPLAN implementation by Adrian Billington (http://www.oracle-developer.net/utilities.php&lt;br /&gt;--               resp. http://www.oracle-developer.net/content/utilities/xplan.zip)&lt;br /&gt;--               and inspired by Kyle Hailey's TCF query (http://dboptimizer.com/2011/09/20/display_cursor/)&lt;br /&gt;--&lt;br /&gt;-- Features:     In addition to the PID (The PARENT_ID) and ORD (The order of execution, note that this doesn't account for the special cases so it might be wrong)&lt;br /&gt;--               columns added by Adrian's wrapper the following additional columns over ALLSTATS are provided:&lt;br /&gt;--&lt;br /&gt;--               A_TIME_SELF        : The time taken by the operation itself - this is the operation's cumulative time minus the direct descendants operation's cumulative time&lt;br /&gt;--               LIO_SELF           : The LIOs done by the operation itself - this is the operation's cumulative LIOs minus the direct descendants operation's cumulative LIOs&lt;br /&gt;--               READS_SELF         : The reads performed the operation itself - this is the operation's cumulative reads minus the direct descendants operation's cumulative reads&lt;br /&gt;--               WRITES_SELF        : The writes performed the operation itself - this is the operation's cumulative writes minus the direct descendants operation's cumulative writes&lt;br /&gt;--               A_TIME_SELF_GRAPH  : A graphical representation of A_TIME_SELF relative to the total A_TIME&lt;br /&gt;--               LIO_SELF_GRAPH     : A graphical representation of LIO_SELF relative to the total LIO&lt;br /&gt;--               READS_SELF_GRAPH   : A graphical representation of READS_SELF relative to the total READS&lt;br /&gt;--               WRITES_SELF_GRAPH  : A graphical representation of WRITES_SELF relative to the total WRITES&lt;br /&gt;--               LIO_RATIO          : Ratio of LIOs per row generated by the row source - the higher this ratio the more likely there could be a more efficient way to generate those rows (be aware of aggregation steps though)&lt;br /&gt;--               TCF_GRAPH          : Each "+"/"-" sign represents one order of magnitude based on ratio between E_ROWS_TIMES_START and A-ROWS. Note that this will be misleading with Parallel Execution (see E_ROWS_TIMES_START)&lt;br /&gt;--               E_ROWS_TIMES_START : The E_ROWS multiplied by STARTS - this is useful for understanding the actual cardinality estimate for related combine child operations getting executed multiple times. Note that this will be misleading with Parallel Execution&lt;br /&gt;--&lt;br /&gt;--               More information including demos can be found online at http://oracle-randolf.blogspot.com/2011/12/extended-displaycursor-with-rowsource.html&lt;br /&gt;--&lt;br /&gt;-- Usage:        @xplan_extended_display_cursor.sql [sql_id] [cursor_child_number] [format_option]&lt;br /&gt;--&lt;br /&gt;--               If both the SQL_ID and CHILD_NUMBER are omitted the previously executed SQL_ID and CHILD_NUMBER of the session will be used&lt;br /&gt;--               If the SQL_ID is specified but the CHILD_NUMBER is omitted then CHILD_NUMBER 0 is assumed&lt;br /&gt;--&lt;br /&gt;--               This prototype does not support processing multiple child cursors like DISPLAY_CURSOR is capable of&lt;br /&gt;--               when passing NULL as CHILD_NUMBER to DISPLAY_CURSOR. Hence a CHILD_NUMBER is mandatory, either&lt;br /&gt;--               implicitly generated (see above) or explicitly passed&lt;br /&gt;--&lt;br /&gt;--               The default formatting option for the call to DBMS_XPLAN.DISPLAY_CURSOR is ALLSTATS LAST - extending this output is the primary purpose of this script&lt;br /&gt;--&lt;br /&gt;-- Note:         You need a veeery wide terminal setting for this prototype, something like linesize 400 should suffice&lt;br /&gt;--&lt;br /&gt;--               This tool is free but comes with no warranty at all - use at your own risk&lt;br /&gt;--&lt;br /&gt;#&lt;br /&gt;&lt;br /&gt;col plan_table_output format a400&lt;br /&gt;set linesize 400 pagesize 0 tab off&lt;br /&gt;&lt;br /&gt;/* ALLSTATS LAST is assumed as the default formatting option for DBMS_XPLAN.DISPLAY_CURSOR */&lt;br /&gt;define default_fo = "ALLSTATS LAST"&lt;br /&gt;&lt;br /&gt;column prev_sql_id new_value prev_sql_id&lt;br /&gt;column prev_child_number new_value prev_cn&lt;br /&gt;&lt;br /&gt;/* Get the previous command as default&lt;br /&gt;   if no SQL_ID / CHILD_NUMBER is passed */&lt;br /&gt;select&lt;br /&gt;        prev_sql_id&lt;br /&gt;      , prev_child_number&lt;br /&gt;from&lt;br /&gt;        v$session&lt;br /&gt;where&lt;br /&gt;        sid = userenv('sid')&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;-- The following is a hack to use default&lt;br /&gt;-- values for defines&lt;br /&gt;column 1 new_value 1&lt;br /&gt;column 2 new_value 2&lt;br /&gt;column 3 new_value 3&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;        '' as "1"&lt;br /&gt;      , '' as "2"&lt;br /&gt;      , '' as "3"&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;where&lt;br /&gt;        rownum = 0;&lt;br /&gt;&lt;br /&gt;column si new_value si&lt;br /&gt;column cn new_value cn&lt;br /&gt;column fo new_value fo&lt;br /&gt;&lt;br /&gt;/* Use passed parameters else refer to previous SQL_ID / CHILD_NUMBER&lt;br /&gt;   ALLSTATS LAST is default formatting option */&lt;br /&gt;select&lt;br /&gt;        nvl('&amp;1', '&amp;prev_sql_id')       as si&lt;br /&gt;      , coalesce('&amp;2', '&amp;prev_cn', '0') as cn&lt;br /&gt;      , nvl('&amp;3', '&amp;default_fo')        as fo&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;column last new_value last&lt;br /&gt;&lt;br /&gt;/* Last or all execution */&lt;br /&gt;select&lt;br /&gt;       case&lt;br /&gt;       when instr('&amp;fo', 'LAST') &amp;gt; 0&lt;br /&gt;       then 'last_'&lt;br /&gt;       end  as last&lt;br /&gt;from&lt;br /&gt;       dual&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set termout on&lt;br /&gt;&lt;br /&gt;with&lt;br /&gt;-- The next three queries are based on the original XPLAN wrapper by Adrian Billington&lt;br /&gt;-- to determine the PID and ORD information, only slightly modified to deal with&lt;br /&gt;-- the 10g special case that V$SQL_PLAN_STATISTICS_ALL doesn't include the ID = 0 operation&lt;br /&gt;-- and starts with 1 instead for Rowsource Statistics&lt;br /&gt;sql_plan_data as&lt;br /&gt;(&lt;br /&gt;  select&lt;br /&gt;          id&lt;br /&gt;        , parent_id&lt;br /&gt;  from&lt;br /&gt;          v$sql_plan_statistics_all&lt;br /&gt;  where&lt;br /&gt;          sql_id = '&amp;si'&lt;br /&gt;  and     child_number = &amp;cn&lt;br /&gt;),&lt;br /&gt;hierarchy_data as&lt;br /&gt;(&lt;br /&gt;  select&lt;br /&gt;          id&lt;br /&gt;        , parent_id&lt;br /&gt;  from&lt;br /&gt;          sql_plan_data&lt;br /&gt;  start with&lt;br /&gt;          id in&lt;br /&gt;          (&lt;br /&gt;            select&lt;br /&gt;                    id&lt;br /&gt;            from&lt;br /&gt;                    sql_plan_data p1&lt;br /&gt;            where&lt;br /&gt;                    not exists&lt;br /&gt;                    (&lt;br /&gt;                      select&lt;br /&gt;                              null&lt;br /&gt;                      from&lt;br /&gt;                              sql_plan_data p2&lt;br /&gt;                      where&lt;br /&gt;                              p2.id = p1.parent_id&lt;br /&gt;                    )&lt;br /&gt;          )&lt;br /&gt;  connect by&lt;br /&gt;          prior id = parent_id&lt;br /&gt;  order siblings by&lt;br /&gt;          id desc&lt;br /&gt;),&lt;br /&gt;ordered_hierarchy_data as&lt;br /&gt;(&lt;br /&gt;  select&lt;br /&gt;          id&lt;br /&gt;        , parent_id                                as pid&lt;br /&gt;        , row_number() over (order by rownum desc) as oid&lt;br /&gt;        , max(id) over ()                          as maxid&lt;br /&gt;        , min(id) over ()                          as minid&lt;br /&gt;  from&lt;br /&gt;          hierarchy_data&lt;br /&gt;),&lt;br /&gt;-- The following query uses the MAX values&lt;br /&gt;-- rather than taking the values of PLAN OPERATION_ID = 0 (or 1 for 10g V$SQL_PLAN_STATISTICS_ALL)&lt;br /&gt;-- for determining the grand totals&lt;br /&gt;--&lt;br /&gt;-- This is because queries that get cancelled do not&lt;br /&gt;-- necessarily have yet sensible values in the root plan operation&lt;br /&gt;--&lt;br /&gt;-- Furthermore with Parallel Execution the elapsed time accumulated&lt;br /&gt;-- with the ALLSTATS option for operations performed in parallel&lt;br /&gt;-- will be greater than the wallclock elapsed time shown for the Query Coordinator&lt;br /&gt;--&lt;br /&gt;-- Note that if you use GATHER_PLAN_STATISTICS with the default&lt;br /&gt;-- row sampling frequency the (LAST_)ELAPSED_TIME will be very likely&lt;br /&gt;-- wrong and hence the time-based graphs and self-statistics will be misleading&lt;br /&gt;--&lt;br /&gt;-- Similar things might happen when cancelling queries&lt;br /&gt;--&lt;br /&gt;-- For queries running with STATISTICS_LEVEL = ALL (or sample frequency set to 1)&lt;br /&gt;-- the A-TIME is pretty reliable&lt;br /&gt;totals as&lt;br /&gt;(&lt;br /&gt;  select&lt;br /&gt;          max(&amp;last.cu_buffer_gets + &amp;last.cr_buffer_gets) as total_lio&lt;br /&gt;        , max(&amp;last.elapsed_time)                          as total_elapsed&lt;br /&gt;        , max(&amp;last.disk_reads)                            as total_reads&lt;br /&gt;        , max(&amp;last.disk_writes)                           as total_writes&lt;br /&gt;  from&lt;br /&gt;          v$sql_plan_statistics_all&lt;br /&gt;  where&lt;br /&gt;          sql_id = '&amp;si'&lt;br /&gt;  and     child_number = &amp;cn&lt;br /&gt;),&lt;br /&gt;-- The totals for the direct descendants of an operation&lt;br /&gt;-- These are required for calculating the work performed&lt;br /&gt;-- by a (parent) operation itself&lt;br /&gt;-- Basically this is the SUM grouped by PARENT_ID&lt;br /&gt;direct_desc_totals as&lt;br /&gt;(&lt;br /&gt;  select&lt;br /&gt;          sum(&amp;last.cu_buffer_gets + &amp;last.cr_buffer_gets) as lio&lt;br /&gt;        , sum(&amp;last.elapsed_time)                          as elapsed&lt;br /&gt;        , sum(&amp;last.disk_reads)                            as reads&lt;br /&gt;        , sum(&amp;last.disk_writes)                           as writes&lt;br /&gt;        , parent_id&lt;br /&gt;  from&lt;br /&gt;          v$sql_plan_statistics_all&lt;br /&gt;  where&lt;br /&gt;          sql_id = '&amp;si'&lt;br /&gt;  and     child_number = &amp;cn&lt;br /&gt;  group by&lt;br /&gt;          parent_id&lt;br /&gt;),&lt;br /&gt;-- Putting the three together&lt;br /&gt;-- The statistics, direct descendant totals plus totals&lt;br /&gt;extended_stats as&lt;br /&gt;(&lt;br /&gt;  select&lt;br /&gt;          stats.id&lt;br /&gt;        , stats.parent_id&lt;br /&gt;        , stats.&amp;last.elapsed_time                                  as elapsed&lt;br /&gt;        , (stats.&amp;last.cu_buffer_gets + stats.&amp;last.cr_buffer_gets) as lio&lt;br /&gt;        , stats.&amp;last.starts                                        as starts&lt;br /&gt;        , stats.&amp;last.output_rows                                   as a_rows&lt;br /&gt;        , stats.cardinality                                         as e_rows&lt;br /&gt;        , stats.&amp;last.disk_reads                                    as reads&lt;br /&gt;        , stats.&amp;last.disk_writes                                   as writes&lt;br /&gt;        , ddt.elapsed                                               as ddt_elapsed&lt;br /&gt;        , ddt.lio                                                   as ddt_lio&lt;br /&gt;        , ddt.reads                                                 as ddt_reads&lt;br /&gt;        , ddt.writes                                                as ddt_writes&lt;br /&gt;        , t.total_elapsed&lt;br /&gt;        , t.total_lio&lt;br /&gt;        , t.total_reads&lt;br /&gt;        , t.total_writes&lt;br /&gt;  from&lt;br /&gt;          v$sql_plan_statistics_all stats&lt;br /&gt;        , direct_desc_totals ddt&lt;br /&gt;        , totals t&lt;br /&gt;  where&lt;br /&gt;          stats.sql_id='&amp;si'&lt;br /&gt;  and     stats.child_number = &amp;cn&lt;br /&gt;  and     ddt.parent_id (+) = stats.id&lt;br /&gt;),&lt;br /&gt;-- Further information derived from above&lt;br /&gt;derived_stats as&lt;br /&gt;(&lt;br /&gt;  select&lt;br /&gt;          id&lt;br /&gt;        , greatest(elapsed - nvl(ddt_elapsed , 0), 0)                              as elapsed_self&lt;br /&gt;        , greatest(lio - nvl(ddt_lio, 0), 0)                                       as lio_self&lt;br /&gt;        , trunc((greatest(lio - nvl(ddt_lio, 0), 0)) / nullif(a_rows, 0))          as lio_ratio&lt;br /&gt;        , greatest(reads - nvl(ddt_reads, 0), 0)                                   as reads_self&lt;br /&gt;        , greatest(writes - nvl(ddt_writes,0) ,0)                                  as writes_self&lt;br /&gt;        , total_elapsed&lt;br /&gt;        , total_lio&lt;br /&gt;        , total_reads&lt;br /&gt;        , total_writes&lt;br /&gt;        , trunc(log(10, nullif(starts * e_rows / nullif(a_rows, 0), 0)))           as tcf_ratio&lt;br /&gt;        , starts * e_rows                                                          as e_rows_times_start&lt;br /&gt;  from&lt;br /&gt;          extended_stats&lt;br /&gt;),&lt;br /&gt;/* Format the data as required */&lt;br /&gt;formatted_data1 as&lt;br /&gt;(&lt;br /&gt;  select&lt;br /&gt;          id&lt;br /&gt;        , lio_ratio&lt;br /&gt;        , total_elapsed&lt;br /&gt;        , total_lio&lt;br /&gt;        , total_reads&lt;br /&gt;        , total_writes&lt;br /&gt;        , to_char(numtodsinterval(round(elapsed_self / 10000) * 10000 / 1000000, 'SECOND'))                         as e_time_interval&lt;br /&gt;          /* Imitate the DBMS_XPLAN number formatting */&lt;br /&gt;        , case&lt;br /&gt;          when lio_self &amp;gt;= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'&lt;br /&gt;          when lio_self &amp;gt;= 10000000000000000000 then to_char(lio_self/1000000000000000000, 'FM99999') || 'E'&lt;br /&gt;          when lio_self &amp;gt;= 10000000000000000 then to_char(lio_self/1000000000000000, 'FM99999') || 'P'&lt;br /&gt;          when lio_self &amp;gt;= 10000000000000 then to_char(lio_self/1000000000000, 'FM99999') || 'T'&lt;br /&gt;          when lio_self &amp;gt;= 10000000000 then to_char(lio_self/1000000000, 'FM99999') || 'G'&lt;br /&gt;          when lio_self &amp;gt;= 10000000 then to_char(lio_self/1000000, 'FM99999') || 'M'&lt;br /&gt;          when lio_self &amp;gt;= 100000 then to_char(lio_self/1000, 'FM99999') || 'K'&lt;br /&gt;          else to_char(lio_self, 'FM99999') || ' '&lt;br /&gt;          end                                                                                                       as lio_self_format&lt;br /&gt;        , case&lt;br /&gt;          when reads_self &amp;gt;= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'&lt;br /&gt;          when reads_self &amp;gt;= 10000000000000000000 then to_char(reads_self/1000000000000000000, 'FM99999') || 'E'&lt;br /&gt;          when reads_self &amp;gt;= 10000000000000000 then to_char(reads_self/1000000000000000, 'FM99999') || 'P'&lt;br /&gt;          when reads_self &amp;gt;= 10000000000000 then to_char(reads_self/1000000000000, 'FM99999') || 'T'&lt;br /&gt;          when reads_self &amp;gt;= 10000000000 then to_char(reads_self/1000000000, 'FM99999') || 'G'&lt;br /&gt;          when reads_self &amp;gt;= 10000000 then to_char(reads_self/1000000, 'FM99999') || 'M'&lt;br /&gt;          when reads_self &amp;gt;= 100000 then to_char(reads_self/1000, 'FM99999') || 'K'&lt;br /&gt;          else to_char(reads_self, 'FM99999') || ' '&lt;br /&gt;          end                                                                                                       as reads_self_format&lt;br /&gt;        , case&lt;br /&gt;          when writes_self &amp;gt;= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'&lt;br /&gt;          when writes_self &amp;gt;= 10000000000000000000 then to_char(writes_self/1000000000000000000, 'FM99999') || 'E'&lt;br /&gt;          when writes_self &amp;gt;= 10000000000000000 then to_char(writes_self/1000000000000000, 'FM99999') || 'P'&lt;br /&gt;          when writes_self &amp;gt;= 10000000000000 then to_char(writes_self/1000000000000, 'FM99999') || 'T'&lt;br /&gt;          when writes_self &amp;gt;= 10000000000 then to_char(writes_self/1000000000, 'FM99999') || 'G'&lt;br /&gt;          when writes_self &amp;gt;= 10000000 then to_char(writes_self/1000000, 'FM99999') || 'M'&lt;br /&gt;          when writes_self &amp;gt;= 100000 then to_char(writes_self/1000, 'FM99999') || 'K'&lt;br /&gt;          else to_char(writes_self, 'FM99999') || ' '&lt;br /&gt;          end                                                                                                       as writes_self_format&lt;br /&gt;        , case&lt;br /&gt;          when e_rows_times_start &amp;gt;= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'&lt;br /&gt;          when e_rows_times_start &amp;gt;= 10000000000000000000 then to_char(e_rows_times_start/1000000000000000000, 'FM99999') || 'E'&lt;br /&gt;          when e_rows_times_start &amp;gt;= 10000000000000000 then to_char(e_rows_times_start/1000000000000000, 'FM99999') || 'P'&lt;br /&gt;          when e_rows_times_start &amp;gt;= 10000000000000 then to_char(e_rows_times_start/1000000000000, 'FM99999') || 'T'&lt;br /&gt;          when e_rows_times_start &amp;gt;= 10000000000 then to_char(e_rows_times_start/1000000000, 'FM99999') || 'G'&lt;br /&gt;          when e_rows_times_start &amp;gt;= 10000000 then to_char(e_rows_times_start/1000000, 'FM99999') || 'M'&lt;br /&gt;          when e_rows_times_start &amp;gt;= 100000 then to_char(e_rows_times_start/1000, 'FM99999') || 'K'&lt;br /&gt;          else to_char(e_rows_times_start, 'FM99999') || ' '&lt;br /&gt;          end                                                                                                       as e_rows_times_start_format&lt;br /&gt;        , rpad(' ', nvl(round(elapsed_self / nullif(total_elapsed, 0) * 12), 0) + 1, '@')                           as elapsed_self_graph&lt;br /&gt;        , rpad(' ', nvl(round(lio_self / nullif(total_lio, 0) * 12), 0) + 1, '@')                                   as lio_self_graph&lt;br /&gt;        , rpad(' ', nvl(round(reads_self / nullif(total_reads, 0) * 12), 0) + 1, '@')                               as reads_self_graph&lt;br /&gt;        , rpad(' ', nvl(round(writes_self / nullif(total_writes, 0) * 12), 0) + 1, '@')                             as writes_self_graph&lt;br /&gt;        , ' ' ||&lt;br /&gt;          case&lt;br /&gt;          when tcf_ratio &amp;gt; 0&lt;br /&gt;          then rpad('-', tcf_ratio, '-')&lt;br /&gt;          else rpad('+', tcf_ratio * -1, '+')&lt;br /&gt;          end                                                                                                       as tcf_graph&lt;br /&gt;  from&lt;br /&gt;          derived_stats&lt;br /&gt;),&lt;br /&gt;/* The final formatted data */&lt;br /&gt;formatted_data as&lt;br /&gt;(&lt;br /&gt;  select&lt;br /&gt;          /*+ Convert the INTERVAL representation to the A-TIME representation used by DBMS_XPLAN&lt;br /&gt;              by turning the days into hours */&lt;br /&gt;          to_char(to_number(substr(e_time_interval, 2, 9)) * 24 + to_number(substr(e_time_interval, 12, 2)), 'FM900') ||&lt;br /&gt;          substr(e_time_interval, 14, 9)&lt;br /&gt;          as a_time_self&lt;br /&gt;        , a.*&lt;br /&gt;  from&lt;br /&gt;          formatted_data1 a&lt;br /&gt;),&lt;br /&gt;/* Combine the information with the original DBMS_XPLAN output */&lt;br /&gt;xplan_data as (&lt;br /&gt;  select&lt;br /&gt;          x.plan_table_output&lt;br /&gt;        , o.id&lt;br /&gt;        , o.pid&lt;br /&gt;        , o.oid&lt;br /&gt;        , o.maxid&lt;br /&gt;        , o.minid&lt;br /&gt;        , a.a_time_self&lt;br /&gt;        , a.lio_self_format&lt;br /&gt;        , a.reads_self_format&lt;br /&gt;        , a.writes_self_format&lt;br /&gt;        , a.elapsed_self_graph&lt;br /&gt;        , a.lio_self_graph&lt;br /&gt;        , a.reads_self_graph&lt;br /&gt;        , a.writes_self_graph&lt;br /&gt;        , a.lio_ratio&lt;br /&gt;        , a.tcf_graph&lt;br /&gt;        , a.total_elapsed&lt;br /&gt;        , a.total_lio&lt;br /&gt;        , a.total_reads&lt;br /&gt;        , a.total_writes&lt;br /&gt;        , a.e_rows_times_start_format&lt;br /&gt;        , x.rn&lt;br /&gt;  from&lt;br /&gt;          (&lt;br /&gt;            select  /* Take advantage of 11g table function dynamic sampling */&lt;br /&gt;                    /*+ dynamic_sampling(dc, 2) */&lt;br /&gt;                    /* This ROWNUM determines the order of the output/processing */&lt;br /&gt;                    rownum as rn&lt;br /&gt;                  , plan_table_output&lt;br /&gt;            from&lt;br /&gt;                    table(dbms_xplan.display_cursor('&amp;si',&amp;cn, '&amp;fo')) dc&lt;br /&gt;          ) x&lt;br /&gt;        , ordered_hierarchy_data o&lt;br /&gt;        , formatted_data a&lt;br /&gt;  where&lt;br /&gt;          o.id (+) = case&lt;br /&gt;                     when regexp_like(x.plan_table_output, '^\|[\* 0-9]+\|')&lt;br /&gt;                     then to_number(regexp_substr(x.plan_table_output, '[0-9]+'))&lt;br /&gt;                     end&lt;br /&gt;  and     a.id (+) = case&lt;br /&gt;                     when regexp_like(x.plan_table_output, '^\|[\* 0-9]+\|')&lt;br /&gt;                     then to_number(regexp_substr(x.plan_table_output, '[0-9]+'))&lt;br /&gt;                     end&lt;br /&gt;)&lt;br /&gt;/* Inject the additional data into the original DBMS_XPLAN output&lt;br /&gt;   by using the MODEL clause */&lt;br /&gt;select&lt;br /&gt;        plan_table_output&lt;br /&gt;from&lt;br /&gt;        xplan_data&lt;br /&gt;model&lt;br /&gt;        dimension by (rn as r)&lt;br /&gt;        measures&lt;br /&gt;        (&lt;br /&gt;          cast(plan_table_output as varchar2(4000)) as plan_table_output&lt;br /&gt;        , id&lt;br /&gt;        , maxid&lt;br /&gt;        , minid&lt;br /&gt;        , pid&lt;br /&gt;        , oid&lt;br /&gt;        , a_time_self&lt;br /&gt;        , lio_self_format&lt;br /&gt;        , reads_self_format&lt;br /&gt;        , writes_self_format&lt;br /&gt;        , e_rows_times_start_format&lt;br /&gt;        , elapsed_self_graph&lt;br /&gt;        , lio_self_graph&lt;br /&gt;        , reads_self_graph&lt;br /&gt;        , writes_self_graph&lt;br /&gt;        , lio_ratio&lt;br /&gt;        , tcf_graph&lt;br /&gt;        , total_elapsed&lt;br /&gt;        , total_lio&lt;br /&gt;        , total_reads&lt;br /&gt;        , total_writes&lt;br /&gt;        , greatest(max(length(maxid)) over () + 3, 6) as csize&lt;br /&gt;        , cast(null as varchar2(128)) as inject&lt;br /&gt;        , cast(null as varchar2(4000)) as inject2&lt;br /&gt;        )&lt;br /&gt;        rules sequential order&lt;br /&gt;        (&lt;br /&gt;          /* Prepare the injection of the OID / PID info */&lt;br /&gt;          inject[r]  = case&lt;br /&gt;                               /* MINID/MAXID are the same for all rows&lt;br /&gt;                                  so it doesn't really matter&lt;br /&gt;                                  which offset we refer to */&lt;br /&gt;                       when    id[cv(r)+1] = minid[cv(r)+1]&lt;br /&gt;                            or id[cv(r)+3] = minid[cv(r)+3]&lt;br /&gt;                            or id[cv(r)-1] = maxid[cv(r)-1]&lt;br /&gt;                       then rpad('-', csize[cv()]*2, '-')&lt;br /&gt;                       when id[cv(r)+2] = minid[cv(r)+2]&lt;br /&gt;                       then '|' || lpad('Pid |', csize[cv()]) || lpad('Ord |', csize[cv()])&lt;br /&gt;                       when id[cv()] is not null&lt;br /&gt;                       then '|' || lpad(pid[cv()] || ' |', csize[cv()]) || lpad(oid[cv()] || ' |', csize[cv()])&lt;br /&gt;                       end&lt;br /&gt;          /* Prepare the injection of the remaining info */&lt;br /&gt;        , inject2[r] = case&lt;br /&gt;                       when    id[cv(r)+1] = minid[cv(r)+1]&lt;br /&gt;                            or id[cv(r)+3] = minid[cv(r)+3]&lt;br /&gt;                            or id[cv(r)-1] = maxid[cv(r)-1]&lt;br /&gt;                       then rpad('-',&lt;br /&gt;                            case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            14 else 0 end /* A_TIME_SELF */       +&lt;br /&gt;                            case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            11 else 0 end /* LIO_SELF */          +&lt;br /&gt;                            case when coalesce(total_reads[cv(r)+1], total_reads[cv(r)+3], total_reads[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            11 else 0 end /* READS_SELF */        +&lt;br /&gt;                            case when coalesce(total_writes[cv(r)+1], total_writes[cv(r)+3], total_writes[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            11 else 0 end /* WRITES_SELF */       +&lt;br /&gt;                            case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            14 else 0 end /* A_TIME_SELF_GRAPH */ +&lt;br /&gt;                            case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            14 else 0 end /* LIO_SELF_GRAPH */    +&lt;br /&gt;                            case when coalesce(total_reads[cv(r)+1], total_reads[cv(r)+3], total_reads[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            14 else 0 end /* READS_SELF_GRAPH */  +&lt;br /&gt;                            case when coalesce(total_writes[cv(r)+1], total_writes[cv(r)+3], total_writes[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            14 else 0 end /* WRITES_SELF_GRAPH */ +&lt;br /&gt;                            case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            11 else 0 end /* LIO_RATIO */         +&lt;br /&gt;                            case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            11 else 0 end /* TCF_GRAPH */         +&lt;br /&gt;                            case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) &amp;gt; 0 then&lt;br /&gt;                            11 else 0 end /* E_ROWS_TIMES_START */&lt;br /&gt;                            , '-')&lt;br /&gt;                       when id[cv(r)+2] = minid[cv(r)+2]&lt;br /&gt;                       then case when total_elapsed[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('A-Time Self |' , 14) end ||&lt;br /&gt;                            case when total_lio[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('Bufs Self |'   , 11) end ||&lt;br /&gt;                            case when total_reads[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('Reads Self|'   , 11) end ||&lt;br /&gt;                            case when total_writes[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('Write Self|'   , 11) end ||&lt;br /&gt;                            case when total_elapsed[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('A-Ti S-Graph |', 14) end ||&lt;br /&gt;                            case when total_lio[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('Bufs S-Graph |', 14) end ||&lt;br /&gt;                            case when total_reads[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('Reads S-Graph|', 14) end ||&lt;br /&gt;                            case when total_writes[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('Write S-Graph|', 14) end ||&lt;br /&gt;                            case when total_lio[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('LIO Ratio |'   , 11) end ||&lt;br /&gt;                            case when total_elapsed[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('TCF Graph |'   , 11) end ||&lt;br /&gt;                            case when total_elapsed[cv(r)+2] &amp;gt; 0 then&lt;br /&gt;                            lpad('E-Rows*Sta|'   , 11) end&lt;br /&gt;                       when id[cv()] is not null&lt;br /&gt;                       then case when total_elapsed[cv()] &amp;gt; 0 then&lt;br /&gt;                            lpad(a_time_self[cv()]               || ' |', 14) end ||&lt;br /&gt;                            case when total_lio[cv()] &amp;gt; 0 then&lt;br /&gt;                            lpad(lio_self_format[cv()]           ||  '|', 11) end ||&lt;br /&gt;                            case when total_reads[cv()] &amp;gt; 0 then&lt;br /&gt;                            lpad(reads_self_format[cv()]         ||  '|', 11) end ||&lt;br /&gt;                            case when total_writes[cv()] &amp;gt; 0 then&lt;br /&gt;                            lpad(writes_self_format[cv()]        ||  '|', 11) end ||&lt;br /&gt;                            case when total_elapsed[cv()] &amp;gt; 0 then&lt;br /&gt;                            rpad(elapsed_self_graph[cv()], 13)   ||  '|'      end ||&lt;br /&gt;                            case when total_lio[cv()] &amp;gt; 0 then&lt;br /&gt;                            rpad(lio_self_graph[cv()], 13)       ||  '|'      end ||&lt;br /&gt;                            case when total_reads[cv()] &amp;gt; 0 then&lt;br /&gt;                            rpad(reads_self_graph[cv()], 13)     ||  '|'      end ||&lt;br /&gt;                            case when total_writes[cv()] &amp;gt; 0 then&lt;br /&gt;                            rpad(writes_self_graph[cv()], 13)    ||  '|'      end ||&lt;br /&gt;                            case when total_lio[cv()] &amp;gt; 0 then&lt;br /&gt;                            lpad(lio_ratio[cv()]                 || ' |', 11) end ||&lt;br /&gt;                            case when total_elapsed[cv()] &amp;gt; 0 then&lt;br /&gt;                            rpad(tcf_graph[cv()], 9)             || ' |'      end ||&lt;br /&gt;                            case when total_elapsed[cv()] &amp;gt; 0 then&lt;br /&gt;                            lpad(e_rows_times_start_format[cv()] ||  '|', 11) end&lt;br /&gt;                       end&lt;br /&gt;          /* Putting it all together */&lt;br /&gt;        , plan_table_output[r] = case&lt;br /&gt;                                 when inject[cv()] like '---%'&lt;br /&gt;                                 then inject[cv()] || plan_table_output[cv()] || inject2[cv()]&lt;br /&gt;                                 when inject[cv()] is present&lt;br /&gt;                                 then regexp_replace(plan_table_output[cv()], '\|', inject[cv()], 1, 2) || inject2[cv()]&lt;br /&gt;                                 else plan_table_output[cv()]&lt;br /&gt;                                 end&lt;br /&gt;        )&lt;br /&gt;order by&lt;br /&gt;        r&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;undefine default_fo&lt;br /&gt;undefine prev_sql_id&lt;br /&gt;undefine prev_cn&lt;br /&gt;undefine last&lt;br /&gt;undefine si&lt;br /&gt;undefine cn&lt;br /&gt;undefine fo&lt;br /&gt;undefine 1&lt;br /&gt;undefine 2&lt;br /&gt;undefine 3&lt;br /&gt;&lt;br /&gt;col plan_table_output clear&lt;br /&gt;col prev_sql_id clear&lt;br /&gt;col prev_child_number clear&lt;br /&gt;col si clear&lt;br /&gt;col cn clear&lt;br /&gt;col fo clear&lt;br /&gt;col last clear&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-1787988376923164386?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/1787988376923164386/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=1787988376923164386' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/1787988376923164386'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/1787988376923164386'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/12/extended-displaycursor-with-rowsource.html' title='Extended DISPLAY_CURSOR With Rowsource Statistics'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-6323642714954491992</id><published>2011-12-07T23:02:00.004+01:00</published><updated>2011-12-07T23:11:10.146+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='Join'/><category scheme='http://www.blogger.com/atom/ns#' term='CBO'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='11.1.0.7'/><category scheme='http://www.blogger.com/atom/ns#' term='Table Functions'/><category scheme='http://www.blogger.com/atom/ns#' term='cardinality'/><title type='text'>Table Functions And Join Cardinality Estimates</title><content type='html'>If you consider the usage of Table Functions then you should be aware of some limitations to the optimizer calculations, in particular when considering a join between a Table Function and other row sources.&lt;br /&gt;&lt;br /&gt;As outlined in one of my &lt;a href="http://oracle-randolf.blogspot.com/2011/11/doag-2011-unconference-wrap-up.html"&gt;previous posts&lt;/a&gt; you can and should help the optimizer to arrive at a reasonable cardinality estimate when dealing with table functions, however doing so doesn't provide all necessary inputs to the join cardinality calculation that are useful and available from the statistics when dealing with regular tables.&lt;br /&gt;&lt;br /&gt;Therefore even when following the recommended practice regarding the cardinality estimates it is possible to end up with some inaccuracies. This post will explain why.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Join Cardinality Basics&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In order to appreciate the problem that can be encountered let's have a quick walkthrough what basic information is used by the optimizer to calculate a join cardinality. Here is a very simplified version of the join selectivity formula:&lt;br /&gt;&lt;br /&gt;Join Selectivity = 1 / greater(num_distinct(t1.c1), num_distinct(t2.c2))&lt;br /&gt;&lt;br /&gt;I've omitted the NULL (and histogram) case and hence simplified the formula further. Furthermore I'll restrict the show case here to a single join column. &lt;br /&gt;&lt;br /&gt;There is another information that is evaluated but not obvious from above formula: The low and high values of the join columns. If the join columns do not overlap at all the join cardinality will be calculated as 1.&lt;br /&gt;&lt;br /&gt;Finally this join selectivity will be multiplied by the (filtered) cardinality of the two row sources to arrive at the join cardinality:&lt;br /&gt;&lt;br /&gt;Join Cardinality = Join Selectivity * cardinality t1 * cardinality t2&lt;br /&gt;&lt;br /&gt;So for this simplified basic join cardinality formula the following information is required from the statistics (if available):&lt;br /&gt;&lt;br /&gt;- (filtered) num_rows row sources&lt;br /&gt;- num_distinct join columns&lt;br /&gt;- low/high value join columns&lt;br /&gt;&lt;br /&gt;Here is an example of this calculation in action, using real tables with table and basic column statistics gathered, hence all of the just mentioned information available:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;drop table t1;&lt;br /&gt;&lt;br /&gt;purge table t1;&lt;br /&gt;&lt;br /&gt;drop table t2;&lt;br /&gt;&lt;br /&gt;purge table t2;&lt;br /&gt;&lt;br /&gt;create table t1&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;        -- 10 distinct values 1..10&lt;br /&gt;      , mod(rownum, 10) + 1 as fk&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 10000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't1')&lt;br /&gt;&lt;br /&gt;create table t2&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        -- 20 distinct values 1..20&lt;br /&gt;        rownum as id&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 20&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't2')&lt;br /&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select&lt;br /&gt;        count(*)&lt;br /&gt;from&lt;br /&gt;        t1&lt;br /&gt;      , t2&lt;br /&gt;where&lt;br /&gt;        t1.fk = t2.id&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 tab off&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      | 10000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T2   |    20 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T1   | 10000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So we have:&lt;br /&gt;&lt;br /&gt;Join Selectivity = 1 / greater(10, 20) = 1 / 20&lt;br /&gt;&lt;br /&gt;Join Cardinality = 1 / 20 * 20 * 10000 = 10000&lt;br /&gt;&lt;br /&gt;The join column values do overlap and there is no filter on the two row sources, so the result is as expected.&lt;br /&gt;&lt;br /&gt;Now if the simple test case gets modified slightly, for example like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;drop table t2;&lt;br /&gt;&lt;br /&gt;purge table t2;&lt;br /&gt;&lt;br /&gt;create table t2&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        -- 1 distinct value&lt;br /&gt;        1 as id&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 20&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't2')&lt;br /&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select&lt;br /&gt;        count(*)&lt;br /&gt;from&lt;br /&gt;        t1&lt;br /&gt;      , t2&lt;br /&gt;where&lt;br /&gt;        t1.fk = t2.id&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 tab off&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      | 20000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T2   |    20 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T1   | 10000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Join Selectivity = 1 / greater(10, 1) = 1 / 10&lt;br /&gt;&lt;br /&gt;Join Cardinality = 1 / 10 * 20 * 10000 = 20000&lt;br /&gt;&lt;br /&gt;So we can see that the number of distinct values is one of the crucial inputs to the join cardinality calculation (Again: I deliberately keep things simple here and for example omit nulls). Another input are the min and max values of the join columns - this can be seen by again slightly modifying the example:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;drop table t2;&lt;br /&gt;&lt;br /&gt;purge table t2;&lt;br /&gt;&lt;br /&gt;create table t2&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        -- 20 distinct values 21..40&lt;br /&gt;        rownum + 20 as id&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 20&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't2')&lt;br /&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select&lt;br /&gt;        count(*)&lt;br /&gt;from&lt;br /&gt;        t1&lt;br /&gt;      , t2&lt;br /&gt;where&lt;br /&gt;        t1.fk = t2.id&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 tab off&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      |     1 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T2   |    20 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T1   | 10000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;We can immediately see that the optimizer detected that the join columns do not overlap and hence set the join cardinality to 1.&lt;br /&gt;&lt;br /&gt;Now let's move on towards our Table Function case and see what happens if the information is missing from the table statistics and gets amended by dynamic sampling.&lt;br /&gt;&lt;br /&gt;First let's start over again with the initial example data set of T2:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;drop table t2;&lt;br /&gt;&lt;br /&gt;purge table t2;&lt;br /&gt;&lt;br /&gt;create table t2&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        -- 20 distinct values 1..20&lt;br /&gt;        rownum as id&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 20&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;-- Do not gather statistics but use dynamic sampling instead&lt;br /&gt;alter session set optimizer_dynamic_sampling = 2;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'dyn_sample_1';&lt;br /&gt;&lt;br /&gt;alter session set events '10053 trace name context forever, level 1';&lt;br /&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select&lt;br /&gt;        count(*)&lt;br /&gt;from&lt;br /&gt;        t1&lt;br /&gt;      , t2&lt;br /&gt;where&lt;br /&gt;        t1.fk = t2.id&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 tab off&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      | 10000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T2   |    20 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T1   | 10000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt; &lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - dynamic sampling used for this statement (level=2)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So we can see that dynamic sampling got used and the cardinality estimate is correct for the join. Let's check the CBO trace file what has been executed as dynamic sampling query:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SELECT /* OPT_DYN_SAMP */ &lt;br /&gt;       /*+ &lt;br /&gt;          ALL_ROWS &lt;br /&gt;          IGNORE_WHERE_CLAUSE &lt;br /&gt;          NO_PARALLEL(SAMPLESUB) &lt;br /&gt;          opt_param('parallel_execution_enabled', 'false') &lt;br /&gt;          NO_PARALLEL_INDEX(SAMPLESUB) &lt;br /&gt;          NO_SQL_TUNE &lt;br /&gt;       */ &lt;br /&gt;          NVL(SUM(C1),0)&lt;br /&gt;        , NVL(SUM(C2),0)&lt;br /&gt;        , COUNT(DISTINCT C3)&lt;br /&gt;        , NVL(SUM(CASE WHEN C3 IS NULL THEN 1 ELSE 0 END),0) &lt;br /&gt;FROM &lt;br /&gt;          (&lt;br /&gt;            SELECT /*+ &lt;br /&gt;                      NO_PARALLEL("T2") &lt;br /&gt;                      FULL("T2") &lt;br /&gt;                      NO_PARALLEL_INDEX("T2") &lt;br /&gt;                   */ &lt;br /&gt;                      1 AS C1&lt;br /&gt;                    , 1 AS C2&lt;br /&gt;                    , "T2"."ID" AS C3 &lt;br /&gt;            FROM &lt;br /&gt;                      "T2" "T2"&lt;br /&gt;          ) SAMPLESUB&lt;br /&gt;;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So it's interesting to see that the dynamic sampling code detected that the column T2.ID is used as part of a join and therefore not only determined the cardinality of the table but also the num_distinct and num_nulls information of the join column.&lt;br /&gt;&lt;br /&gt;Consequently we find this information following in the trace file:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;    ndv C3 : 20&lt;br /&gt;        scaled : 20.00&lt;br /&gt;    nulls C4 : 0&lt;br /&gt;        scaled : 0.00&lt;br /&gt;    min. sel. est. : -1.00000000&lt;br /&gt;** Dynamic sampling col. stats.:&lt;br /&gt;  Column (#1): ID(  Part#: 0&lt;br /&gt;    AvgLen: 22 NDV: 20 Nulls: 0 Density: 0.050000&lt;br /&gt;** Using dynamic sampling NULLs estimates.&lt;br /&gt;** Using dynamic sampling NDV estimates.&lt;br /&gt;   Scaled NDVs using cardinality = 20.&lt;br /&gt;** Using dynamic sampling card. : 20&lt;br /&gt;** Dynamic sampling updated table card.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Now if you've followed the description so far carefully you'll notice that there is something missing from the dynamic sampling information that is usually available from the dictionary statistics.&lt;br /&gt;&lt;br /&gt;Let's repeat the exercise and use the example data set of T2 where the join columns do not overlap:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;drop table t2;&lt;br /&gt;&lt;br /&gt;purge table t2;&lt;br /&gt;&lt;br /&gt;create table t2&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        -- 20 distinct values 21..40&lt;br /&gt;        rownum + 20 as id&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 20&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;-- Do not gather statistics but use dynamic sampling instead&lt;br /&gt;alter session set optimizer_dynamic_sampling = 2;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'dyn_sample_2';&lt;br /&gt;&lt;br /&gt;alter session set events '10053 trace name context forever, level 1';&lt;br /&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select&lt;br /&gt;        count(*)&lt;br /&gt;from&lt;br /&gt;        t1&lt;br /&gt;      , t2&lt;br /&gt;where&lt;br /&gt;        t1.fk = t2.id&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 tab off&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      | 10000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T2   |    20 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T1   | 10000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt; &lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - dynamic sampling used for this statement (level=2)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So this is a case where even with dynamic sampling the join cardinality estimate is incorrect and different from what you get with actually gathered statistics. Whether this is a deliberate design decision to keep the footprint of the dynamic sampling query as low as possible or an omission that could be fixed by a bug/enhancement request I don't know but if you were asking yourself whether dynamic sampling is a reasonable replacement for actual statistics or not - here is a case where it doesn't produce the same as basic table and column statistics.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Joins With Table Functions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now that the basics have been clarified, let's move on to table functions. As outlined in the past and in one of my previous posts you shouldn't use table functions without helping the optimizer to come up with a reasonable cardinality estimate.&lt;br /&gt;&lt;br /&gt;But as you have just seen for a proper join cardinality estimation there is more required than a reasonable cardinality estimate of a row source. Let's see what this means when attempting to join table functions with other row sources.&lt;br /&gt;&lt;br /&gt;For that purpose I create the following simple table function that allows to generate a simple set of data controlled by the input parameters:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;drop type t_num_list;&lt;br /&gt;&lt;br /&gt;drop function f_tab_pipelined;&lt;br /&gt;&lt;br /&gt;create or replace type t_num_list as table of number;&lt;br /&gt;&lt;br /&gt;create or replace function&lt;br /&gt;f_tab_pipelined(&lt;br /&gt;  in_start in number default 1&lt;br /&gt;, in_end   in number default 10&lt;br /&gt;, in_val   in number default null&lt;br /&gt;) return t_num_list pipelined&lt;br /&gt;is&lt;br /&gt;begin&lt;br /&gt;  for i in in_start..in_end loop&lt;br /&gt;    pipe row(coalesce(in_val, i));&lt;br /&gt;  end loop;&lt;br /&gt;  return;&lt;br /&gt;end f_tab_pipelined;&lt;br /&gt;/&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So I can define the start and end values which will be returned with a step size of one. If the third parameter is provided rather than returning the current loop value the third parameter will be returned, resulting in a single distinct value repeated end - start + 1 times, otherwise the number of distinct values will be equal to the number of rows generated.&lt;br /&gt;&lt;br /&gt;From 11.1 on dynamic sampling of table functions is supported, so let's simulate the cases I've just demonstrated with real tables.&lt;br /&gt;&lt;br /&gt;First the case with 20 distinct values of T2 and overlapping join column values:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;alter session set tracefile_identifier = 'dyn_sample_table_func';&lt;br /&gt;&lt;br /&gt;alter session set events '10053 trace name context lifetime 1, level 1';&lt;br /&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select  /*+ dynamic_sampling(t2, 2) */&lt;br /&gt;        count(*)&lt;br /&gt;from&lt;br /&gt;        t1&lt;br /&gt;      , (&lt;br /&gt;          select&lt;br /&gt;                  column_value as id&lt;br /&gt;          from&lt;br /&gt;                  table(f_tab_pipelined(1, 20, null))&lt;br /&gt;        ) t2&lt;br /&gt;where&lt;br /&gt;        t1.fk = t2.id&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 tab off&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));&lt;br /&gt;&lt;br /&gt;-----------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                           | Name            | Rows  |&lt;br /&gt;-----------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT                    |                 |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE                     |                 |     1 |&lt;br /&gt;|   2 |   HASH JOIN                         |                 | 20000 |&lt;br /&gt;|   3 |    COLLECTION ITERATOR PICKLER FETCH| F_TAB_PIPELINED |    20 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL                | T1              | 10000 |&lt;br /&gt;-----------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - dynamic sampling used for this statement (level=2)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So something is already going wrong here - with the real table and dynamic sampling used we end up with the correct cardinality estimate of 10,000 rows.&lt;br /&gt;&lt;br /&gt;Let's have a look at the dynamic sampling query generated:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SELECT &lt;br /&gt;       /* OPT_DYN_SAMP */ &lt;br /&gt;       /*+ &lt;br /&gt;          ALL_ROWS &lt;br /&gt;          IGNORE_WHERE_CLAUSE &lt;br /&gt;          NO_PARALLEL(SAMPLESUB) &lt;br /&gt;          opt_param('parallel_execution_enabled', 'false') &lt;br /&gt;          NO_PARALLEL_INDEX(SAMPLESUB) &lt;br /&gt;          NO_SQL_TUNE &lt;br /&gt;       */ &lt;br /&gt;          NVL(SUM(C1),0)&lt;br /&gt;        , NVL(SUM(C2),0) &lt;br /&gt;FROM &lt;br /&gt;          (&lt;br /&gt;            SELECT &lt;br /&gt;                   /*+ &lt;br /&gt;                      NO_PARALLEL("KOKBF$") &lt;br /&gt;                      FULL("KOKBF$") &lt;br /&gt;                      NO_PARALLEL_INDEX("KOKBF$") &lt;br /&gt;                   */ &lt;br /&gt;                      1 AS C1&lt;br /&gt;                    , 1 AS C2 &lt;br /&gt;            FROM &lt;br /&gt;                      TABLE("CBO_TEST"."F_TAB_PIPELINED"(1,20,NULL)) "KOKBF$"&lt;br /&gt;          ) SAMPLESUB&lt;br /&gt;;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So that's a pity: The dynamic sampling code for table functions at present doesn't recognize the need to gather join column statistics, so the optimizer simply doesn't know about the num_distinct / num_nulls figures of the join column generated by the table function.&lt;br /&gt;&lt;br /&gt;It certainly would be nice if Oracle enhanced the dynamic sampling code so that we have at least the same level of information available as with regular tables.&lt;br /&gt;&lt;br /&gt;So what does that mean to the Join Cardinality estimate of the optimizer? It looks like that the Join Selectivity formula when dealing with Table Functions could be extended like this:&lt;br /&gt;&lt;br /&gt;Join Selectivity = 1 / coalesce(greater(num_distinct(t1.c1), num_distinct(t2.c2)), num_distinct(t1.c1), num_distinct(t2.c2), 100)&lt;br /&gt;&lt;br /&gt;So the "greater" function will return NULL if any of the operands are NULL. In this case it seems to use a non-null num_distinct if found, and if none of them are defined, resort to a hard-coded default of 100 resulting in a default selectivity of 1/100.&lt;br /&gt;&lt;br /&gt;In our case:&lt;br /&gt;&lt;br /&gt;Join Selectivity = 1 / coalesce(greater(10, null), 10, null, 100) = 1 / 10&lt;br /&gt;&lt;br /&gt;Join Cardinality = 1 / 10 * 20 * 10000 = 20000&lt;br /&gt;&lt;br /&gt;Of course you'll appreciate that the same applies to Table Functions with regards to the non-overlapping join columns - the optimizer doesn't have a clue about these low and high values from the dynamic sampling performed hence it cannot detect such a case.&lt;br /&gt;&lt;br /&gt;If you like to see the default case described in above formula in action, it just needs a join of two table functions:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select  /*+ dynamic_sampling(t1, 2) dynamic_sampling(t2, 2) */&lt;br /&gt;        count(*)&lt;br /&gt;from&lt;br /&gt;        (&lt;br /&gt;          select&lt;br /&gt;                  column_value as fk&lt;br /&gt;          from&lt;br /&gt;                  table(f_tab_pipelined(1, 10000, null))&lt;br /&gt;        ) t1&lt;br /&gt;      , (&lt;br /&gt;          select&lt;br /&gt;                  column_value as id&lt;br /&gt;          from&lt;br /&gt;                  table(f_tab_pipelined(1, 20, null))&lt;br /&gt;        ) t2&lt;br /&gt;where&lt;br /&gt;        t1.fk = t2.id&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 tab off&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC ROWS NOTE'));&lt;br /&gt;&lt;br /&gt;-----------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                           | Name            | Rows  |&lt;br /&gt;-----------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT                    |                 |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE                     |                 |     1 |&lt;br /&gt;|   2 |   HASH JOIN                         |                 |  2000 |&lt;br /&gt;|   3 |    COLLECTION ITERATOR PICKLER FETCH| F_TAB_PIPELINED |    20 |&lt;br /&gt;|   4 |    COLLECTION ITERATOR PICKLER FETCH| F_TAB_PIPELINED | 10000 |&lt;br /&gt;-----------------------------------------------------------------------&lt;br /&gt; &lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - dynamic sampling used for this statement (level=2)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So dynamic sampling gets the row source cardinality right, but the join cardinality is way off. If you repeat the same exercise with regular tables and dynamic sampling the correct join cardinality of 20 will be estimated because Oracle detects the 10000 distinct values of T1.FK.&lt;br /&gt;&lt;br /&gt;If you really have the need to perform corrective actions with regular tables you can resort to the special hints that Oracle uses in SQL Profiles, which are of course undocumented and therefore you'll have to use them at your own risk. Christian Antognini published a &lt;a href="http://antognini.ch/papers/SQLProfiles_20060622.pdf"&gt;very nice paper&lt;/a&gt; about SQL Profiles some time ago that also covers the internal details including the hints introduced for that purpose.&lt;br /&gt;&lt;br /&gt;In our case the hint that would allow to provide the missing information would be in particular COLUMN_STATS. &lt;br /&gt;&lt;br /&gt;So some variation of the following would be helpful if it worked:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;select  /*+ dynamic_sampling(t2, 2) column_stats(t2, id, scale, min=1 max=20 nulls=0 distinct=20) */&lt;br /&gt;        count(*)&lt;br /&gt;from&lt;br /&gt;        t1&lt;br /&gt;      , (&lt;br /&gt;          select&lt;br /&gt;                  column_value as id&lt;br /&gt;          from&lt;br /&gt;                  table(f_tab_pipelined(1, 20, null))&lt;br /&gt;        ) t2&lt;br /&gt;where&lt;br /&gt;        t1.fk = t2.id&lt;br /&gt;;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Unfortunately the optimizer obviously refuses to apply those hints to table functions - they are only working with regular tables, so this doesn't help either.&lt;br /&gt;&lt;br /&gt;This might also explain why the dynamic sampling code doesn't bother to query that kind of information for Table Functions - may be the optimizer at present cannot process such information and therefore it would be useless to gather it anyway.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you would like to use Table Functions and also join them to other row sources you'll have to carefully check the join cardinality estimates of the optimizer, because some crucial information required for a proper join cardinality calculation is not available when dealing with Table Functions. This is also true when using 11g features like Table Function dynamic sampling.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-6323642714954491992?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/6323642714954491992/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=6323642714954491992' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/6323642714954491992'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/6323642714954491992'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/12/table-functions-and-join-cardinality.html' title='Table Functions And Join Cardinality Estimates'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-8310391258888769726</id><published>2011-11-24T22:54:00.004+01:00</published><updated>2011-11-25T22:45:43.361+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='troubleshooting'/><category scheme='http://www.blogger.com/atom/ns#' term='events'/><category scheme='http://www.blogger.com/atom/ns#' term='undocumented'/><category scheme='http://www.blogger.com/atom/ns#' term='Cool Stuff'/><title type='text'>How To Cancel A Query Running In Another Session</title><content type='html'>This is not really anything new - in fact Tanel Poder has already &lt;a href="http://blog.tanelpoder.com/2010/02/17/how-to-cancel-a-query-running-in-another-session/"&gt;blogged about it&lt;/a&gt; a while ago. Tanel has specifically covered the handling of "urgent" TCP packets and how this could be used to signal a "cancel" to another process, however this only works on Unix environments and not with Windows SQL*Plus clients. In Tanel's article it is also mentioned that there is an officially documented way of doing this via the Resource Manager if you happen to have an Enterprise Edition license.&lt;br /&gt;&lt;br /&gt;In my quick tests however the call to DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_SESS using "CANCEL_SQL" as consumer group only errors out with ORA-29366 saying that the specified consumer group is invalid.&lt;br /&gt;&lt;br /&gt;So ideally there should be an approach that is independent from client or server O/S or license details, and indeed there is one, however it is using an undocumented event and therefore is unsupported and can only be used at your own risk.&lt;br /&gt;&lt;br /&gt;If you set event 10237 ("ORA-10237: simulate ^C (for testing purposes)") in a session to any level greater 0 then any currently running and future execution will be "cancelled", so once the cancellation was successful the event needs to be unset otherwise the session will be in an unusable state cancelling any further attempts (applies even if the "lifetime 1" clause is used instead of "forever" when using ORADEBUG to set the event).&lt;br /&gt;&lt;br /&gt;So a simple script like the following should be sufficient to cancel a current execution in another session without the need to kill the session.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;--------------------------------------------------------&lt;br /&gt;--&lt;br /&gt;-- simulate_control_c.sql&lt;br /&gt;--&lt;br /&gt;-- Purpose:&lt;br /&gt;--&lt;br /&gt;-- Sets event 10237 in a session to simulate&lt;br /&gt;-- pressing CONTROL+C for that session&lt;br /&gt;--&lt;br /&gt;-- Allows to cancel a running SQL statement from&lt;br /&gt;-- a remote session without killing the session&lt;br /&gt;--&lt;br /&gt;-- If the session is stuck on the server side&lt;br /&gt;-- which means that it can't be killed this&lt;br /&gt;-- probably won't help either&lt;br /&gt;--&lt;br /&gt;-- Requirements:&lt;br /&gt;--&lt;br /&gt;-- EXECUTE privilege on SYS.DBMS_SYSTEM&lt;br /&gt;-- SELECT privilege on V$SESSION&lt;br /&gt;--&lt;br /&gt;-- Usage:&lt;br /&gt;--&lt;br /&gt;--            @simulate_control_c &amp;lt;SID&amp;gt;&lt;br /&gt;--&lt;br /&gt;-- Note:&lt;br /&gt;--&lt;br /&gt;-- The usage of that event is undocumented&lt;br /&gt;-- Therefore use at your own risk!&lt;br /&gt;-- Provided for free, without any warranties -&lt;br /&gt;-- test this before using it on anything important&lt;br /&gt;--&lt;br /&gt;-- Other implementation ideas:&lt;br /&gt;--&lt;br /&gt;-- The following code is supposed to achieve the same on Enterprise Edition&lt;br /&gt;-- and enabled Resource Manager in a documented way&lt;br /&gt;-- In all versions tested (10.2.0.4, 11.1.0.7, 11.2.0.1, 11.2.0.2) I get however&lt;br /&gt;-- ORA-29366 and it doesn't work as described&lt;br /&gt;-- Note that the official documentation doesn't explicitly mention CANCEL_SQL as &lt;br /&gt;-- valid consumer group for this call&lt;br /&gt;&lt;br /&gt;-- begin&lt;br /&gt;--   sys.dbms_resource_manager.switch_consumer_group_for_sess(&lt;br /&gt;--     &amp;lt;sid&amp;gt;,&amp;lt;serial#&amp;gt;,'CANCEL_SQL'&lt;br /&gt;--   );&lt;br /&gt;-- end;&lt;br /&gt;--&lt;br /&gt;-- When running on Unix KILL -URG sent to the server process&lt;br /&gt;-- should also simulate a Control-C&lt;br /&gt;-- This doesn't work with Windows SQL*Plus clients though&lt;br /&gt;--&lt;br /&gt;-- See Tanel Poder's blog post for more info&lt;br /&gt;-- http://blog.tanelpoder.com/2010/02/17/how-to-cancel-a-query-running-in-another-session/&lt;br /&gt;--&lt;br /&gt;-- Author:&lt;br /&gt;--&lt;br /&gt;-- Randolf Geist&lt;br /&gt;-- http://oracle-randolf.blogspot.com&lt;br /&gt;--&lt;br /&gt;-- Versions tested:&lt;br /&gt;--&lt;br /&gt;-- 11.2.0.1 Server+Client&lt;br /&gt;-- 10.2.0.4 Server&lt;br /&gt;-- 11.2.0.2 Server&lt;br /&gt;--&lt;br /&gt;--------------------------------------------------------&lt;br /&gt;&lt;br /&gt;set echo off verify off feedback off&lt;br /&gt;&lt;br /&gt;column sid new_value v_sid noprint&lt;br /&gt;column serial# new_value v_serial noprint&lt;br /&gt;&lt;br /&gt;-- Get details from V$SESSION&lt;br /&gt;select&lt;br /&gt;        sid&lt;br /&gt;      , serial#&lt;br /&gt;from&lt;br /&gt;        v$session&lt;br /&gt;where&lt;br /&gt;        sid = to_number('&amp;1')&lt;br /&gt;and     status = 'ACTIVE'&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;declare&lt;br /&gt;  -- Avoid compilation errors in case of SID not found&lt;br /&gt;  v_sid     number  := to_number('&amp;v_sid');&lt;br /&gt;  v_serial  number  := to_number('&amp;v_serial');&lt;br /&gt;  v_status  varchar2(100);&lt;br /&gt;  -- 60 seconds default timeout&lt;br /&gt;  n_timeout number  := 60;&lt;br /&gt;  dt_start  date    := sysdate;&lt;br /&gt;begin&lt;br /&gt;  -- SID not found&lt;br /&gt;  if v_sid is null then&lt;br /&gt;    raise_application_error(-20001, 'SID: &amp;1 cannot be found or is not in STATUS=ACTIVE');&lt;br /&gt;  else&lt;br /&gt;    -- Set event 10237 to level 1 in session to simulate CONTROL+C&lt;br /&gt;    sys.dbms_system.set_ev(v_sid, v_serial, 10237, 1, '');&lt;br /&gt;    -- Check session state&lt;br /&gt;    loop&lt;br /&gt;      begin&lt;br /&gt;        select&lt;br /&gt;                status&lt;br /&gt;        into&lt;br /&gt;                v_status&lt;br /&gt;        from&lt;br /&gt;                v$session&lt;br /&gt;        where&lt;br /&gt;                sid = v_sid;&lt;br /&gt;      exception&lt;br /&gt;      -- SID no longer found&lt;br /&gt;      when NO_DATA_FOUND then&lt;br /&gt;        raise_application_error(-20001, 'SID: ' || v_sid || ' no longer found after cancelling');&lt;br /&gt;      end;&lt;br /&gt;&lt;br /&gt;      -- Status no longer active&lt;br /&gt;      -- then set event level to 0 to avoid further cancels&lt;br /&gt;      if v_status != 'ACTIVE' then&lt;br /&gt;        sys.dbms_system.set_ev(v_sid, v_serial, 10237, 0, '');&lt;br /&gt;        exit;&lt;br /&gt;      end if;&lt;br /&gt;&lt;br /&gt;      -- Session still active after timeout exceeded&lt;br /&gt;      -- Give up&lt;br /&gt;      if dt_start + (n_timeout / 86400) &amp;lt; sysdate then&lt;br /&gt;        sys.dbms_system.set_ev(v_sid, v_serial, 10237, 0, '');&lt;br /&gt;        raise_application_error(-20001, 'SID: ' || v_sid || ' still active after ' || n_timeout || ' seconds');&lt;br /&gt;      end if;&lt;br /&gt;&lt;br /&gt;      -- Back off after 5 seconds&lt;br /&gt;      -- Check only every second from then on&lt;br /&gt;      -- Avoids burning CPU and potential contention by this loop&lt;br /&gt;      -- However this means that more than a single statement potentially&lt;br /&gt;      -- gets cancelled during this second&lt;br /&gt;      if dt_start + (5 / 86400) &amp;lt; sysdate then&lt;br /&gt;        dbms_lock.sleep(1);&lt;br /&gt;      end if;&lt;br /&gt;    end loop;&lt;br /&gt;  end if;&lt;br /&gt;end;&lt;br /&gt;/&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;It is particularly useful in Windows environments where the SQL*Plus executable by default doesn't allow cancelling a current execution by pressing Control+C - it works only while fetching or pressing it a second time, terminating the whole SQL*Plus client.&lt;br /&gt;&lt;br /&gt;Note that Tanel's method is probably able to cancel queries that this approach cannot cancel because the URGENT signal handler under Unix effectively causes an interrupt to the running process executing the corresponding handler code whereas the event set here has to be actively checked by the code of the running process.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-8310391258888769726?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/8310391258888769726/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=8310391258888769726' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/8310391258888769726'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/8310391258888769726'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/11/how-to-cancel-query-running-in-another.html' title='How To Cancel A Query Running In Another Session'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-6601205046391354589</id><published>2011-11-19T23:20:00.009+01:00</published><updated>2011-11-20T20:21:51.975+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Public Appearance'/><title type='text'>DOAG 2011 Unconference Wrap-Up</title><content type='html'>My sessions at DOAG all went well, and I in particular liked the Unconference ones. As promised I held two of them and they were rather different. &lt;br /&gt;&lt;br /&gt;The first one only had a couple of attendees (including OakTable fellow &lt;a href="http://antognini.ch"&gt;Christian Antognini&lt;/a&gt; from Switzerland), so we could gather around my laptop and do a real "Optimizer Hacking Session". Actually we had to do that because the projector &lt;a href="http://www.doag.org/events/konferenzen/doag-2011/das-programm/doag-2011-unconference.html"&gt;promised by DOAG&lt;/a&gt; wasn't there. I talked mainly about some common traps when performing SQL statement troubleshooting and what to do in order to avoid them, but also showed some cool stuff that I'll shortly blog about separately.&lt;br /&gt;&lt;br /&gt;The second session two hours later was attended by many more than I expected, so there was no chance at all to gather around my laptop. Originally I planned to talk about some common pitfalls why the estimates of the optimizer can go wrong (even with 100% computed statistics) and what to do about, but I then realized that I should at least give a short summary of what I've shown in the first session to those that hadn't attended (which were most of the attendees anyway). This started off an interesting discussion with many questions - a surprising number revolved around the usage of user-defined PL/SQL functions.&lt;br /&gt;&lt;br /&gt;Since even the WiFi connection didn't work properly I could only mention briefly some important articles that should be read if one wants to get a good understanding of how the optimizer can (actually should!) be helped when dealing with PL/SQL functions.&lt;br /&gt;&lt;br /&gt;So for reference I repeat here a summary of relevant articles:&lt;br /&gt;&lt;br /&gt;1. Expert Oracle Practices (the book I co-authored), Chapter 7, "PL/SQL and the CBO" by OakTable fellow &lt;a href="http://www.dbprof.com/"&gt;Joze Senegacnik&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;2. OakTable Member Adrian Billington on &lt;a href="http://www.oracle-developer.net/display.php?id=426"&gt;pl/sql functions and cbo costing&lt;/a&gt;, which is inspired by Joze's work above&lt;br /&gt;&lt;br /&gt;3. Adrian Billington on &lt;a href="http://www.oracle-developer.net/display.php?id=427"&gt;"setting cardinality for pipelined and table functions"&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;4. Adrian Billington's generic &lt;a href="http://www.oracle-developer.net/content/utilities/ccard.sql"&gt;"Collection Cardinality Utility"&lt;/a&gt;, for background info see the article above, the source code includes a description when and how to use it&lt;br /&gt;&lt;br /&gt;5. The problem of the CBO ignoring the projection for costing, described on my &lt;a href="http://oracle-randolf.blogspot.com/2010/01/when-your-projection-is-not-cost-free.html"&gt;blog&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There were some other questions in the same context, for example, what would a Real Time SQL Monitoring report look like with user-defined PL/SQL functions - here is an example &lt;a href="http://www.sqltools-plusplus.org:7676/media/projection_udf_monitor_report.html"&gt;you can view / download (remember that the file can be shared since it is self-contained)&lt;/a&gt; that is taken from a query quite similar to the one described in my blog post. By the way, it's a query that gives the approach a try discussed during the session if there a chance of getting the "projection" cost right by playing clever tricks that combine projection and selection. I didn't get it to work as desired, but at least the "selection" cost showed up, which however doesn't address the actual problem of the "projection" cost in my particular test case. So Oracle still attempts to merge the views and has then to run the user-defined function many more times than necessary - in fact the query represents the worst case because the user-defined function is called for both, selection and projection.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.sqltools-plusplus.org:7676/media/projection_udf_monitor_report.html"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 1024px; height: 384px;" src="http://www.sqltools-plusplus.org:7676/media/projection_udf_monitor_report.jpg" border="0" alt="Sample Real Time SQL Monitoring Report" id="BLOGGER_PHOTO_ID_5676838786110503010" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;You can see from the report that at least some PL/SQL time shows up, but it also shows a deficiency of the report - it doesn't show detailed logical I/O values, only a summary of Buffer Gets.&lt;br /&gt;&lt;br /&gt;And interestingly, although nothing about this is visible from the end-user report, the report file itself (in the XML data section) contains at least a couple of optimizer parameter settings - to address another question raised during the session.&lt;br /&gt;&lt;br /&gt;All in all I really liked these sessions and I hope the attendees enjoyed them as much as I did.&lt;br /&gt;&lt;br /&gt;Last, but not least: It's not definitive yet but very likely I'll give a few free "Optimizer Hacking Sessions" in the future on the internet, so stay tuned!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-6601205046391354589?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/6601205046391354589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=6601205046391354589' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/6601205046391354589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/6601205046391354589'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/11/doag-2011-unconference-wrap-up.html' title='DOAG 2011 Unconference Wrap-Up'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-2766024401207981097</id><published>2011-11-14T23:26:00.003+01:00</published><updated>2011-11-14T23:39:24.499+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Bitmap Index'/><category scheme='http://www.blogger.com/atom/ns#' term='troubleshooting'/><category scheme='http://www.blogger.com/atom/ns#' term='Fundamentals'/><category scheme='http://www.blogger.com/atom/ns#' term='CBO'/><category scheme='http://www.blogger.com/atom/ns#' term='star transformation'/><category scheme='http://www.blogger.com/atom/ns#' term='wrong estimates'/><category scheme='http://www.blogger.com/atom/ns#' term='cardinality'/><title type='text'>Star Transformation And Cardinality Estimates</title><content type='html'>If you want to make use of Oracle's cunning Star Transformation feature then you need to be aware of the fact that the star transformation logic - as the name implies - assumes that you are using a proper star schema.&lt;br /&gt;&lt;br /&gt;Here is a nice example of what can happen if you attempt to use star transformation but your model obviously doesn't really correspond to what Oracle expects:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;drop table d;&lt;br /&gt;&lt;br /&gt;purge table d;&lt;br /&gt;&lt;br /&gt;drop table t;&lt;br /&gt;&lt;br /&gt;purge table t;&lt;br /&gt;&lt;br /&gt;create table t&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;         rownum as id&lt;br /&gt;       , mod(rownum, 100) + 1 as fk1&lt;br /&gt;       , 1000 + mod(rownum, 10) + 1 as fk2&lt;br /&gt;       , 2000 + mod(rownum, 100) + 1 as fk3&lt;br /&gt;       , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;         dual&lt;br /&gt;connect by&lt;br /&gt;         level &lt;= 1000000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't')&lt;br /&gt;&lt;br /&gt;create bitmap index t_fk1 on t (fk1);&lt;br /&gt;&lt;br /&gt;create bitmap index t_fk2 on t (fk2);&lt;br /&gt;&lt;br /&gt;create bitmap index t_fk3 on t (fk3);&lt;br /&gt;&lt;br /&gt;create table d&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;         rownum as id&lt;br /&gt;       , case when rownum between 1 and 100 then 'Y' else 'N' end as is_flag_d1&lt;br /&gt;       , case when rownum between 1001 and 1010 then 'Y' else 'N' end as is_flag_d2&lt;br /&gt;       , case when rownum between 2001 and 2100 then 'Y' else 'N' end as is_flag_d3&lt;br /&gt;       , rpad('x', 100) as vc1&lt;br /&gt;from&lt;br /&gt;         dual&lt;br /&gt;connect by&lt;br /&gt;         level &lt;= 10000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 'd', method_opt =&gt; 'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 IS_FLAG_D1, IS_FLAG_D2, IS_FLAG_D3');&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This is a simplified example of a model where multiple, potentially small, dimensions are stored in a single physical table and the separate dimensions are represented by views that filter the corresponding dimension data from the base table.&lt;br /&gt;&lt;br /&gt;So we have a fact table with one million rows and a "collection" dimension table that holds three dimensions, among others.&lt;br /&gt;&lt;br /&gt;In order to enable the star transformation bitmap indexes on the foreign keys of the fact table are created.&lt;br /&gt;&lt;br /&gt;The dimension table has histograms on the flag columns to tell the optimizer about the non-uniform distribution of the column data.&lt;br /&gt;&lt;br /&gt;Now imagine a query where we query the fact table (and possibly do some filtering on the fact table by other means like other dimensions or direct filtering on the fact table) but need to join these three dimensions just for displaying purpose - the dimensions itself are not filtered so the join will not filter out any data.&lt;br /&gt;&lt;br /&gt;Let's first have a look at an execution plan of such a simply query with star transformation disabled:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;select /*+ no_star_transformation */&lt;br /&gt;       count(*)&lt;br /&gt;from&lt;br /&gt;       t f&lt;br /&gt;     , (select * from d where is_flag_d1 = 'Y') d1&lt;br /&gt;     , (select * from d where is_flag_d2 = 'Y') d2&lt;br /&gt;     , (select * from d where is_flag_d3 = 'Y') d3&lt;br /&gt;where&lt;br /&gt;       f.fk1 = d1.id&lt;br /&gt;and    f.fk2 = d2.id&lt;br /&gt;and    f.fk3 = d3.id&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;SQL&gt; explain plan for&lt;br /&gt;  2  select /*+ no_star_transformation */&lt;br /&gt;  3         count(*)&lt;br /&gt;  4  from&lt;br /&gt;  5         t f&lt;br /&gt;  6       , (select * from d where is_flag_d1 = 'Y') d1&lt;br /&gt;  7       , (select * from d where is_flag_d2 = 'Y') d2&lt;br /&gt;  8       , (select * from d where is_flag_d3 = 'Y') d3&lt;br /&gt;  9  where&lt;br /&gt; 10         f.fk1 = d1.id&lt;br /&gt; 11  and    f.fk2 = d2.id&lt;br /&gt; 12  and    f.fk3 = d3.id&lt;br /&gt; 13  ;&lt;br /&gt;&lt;br /&gt;Explained.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select * from table(dbms_xplan.display(format =&gt; 'BASIC +ROWS +PREDICATE'));&lt;br /&gt;Plan hash value: 77569906&lt;br /&gt;&lt;br /&gt;----------------------------------------------&lt;br /&gt;| Id  | Operation             | Name | Rows  |&lt;br /&gt;----------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT      |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE       |      |     1 |&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;|*  2 |   HASH JOIN           |      |   940K|&lt;/span&gt;&lt;br /&gt;|*  3 |    TABLE ACCESS FULL  | D    |   100 |&lt;br /&gt;|*  4 |    HASH JOIN          |      |   945K|&lt;br /&gt;|*  5 |     TABLE ACCESS FULL | D    |   100 |&lt;br /&gt;|*  6 |     HASH JOIN         |      |   950K|&lt;br /&gt;|*  7 |      TABLE ACCESS FULL| D    |    10 |&lt;br /&gt;|   8 |      TABLE ACCESS FULL| T    |  1000K|&lt;br /&gt;----------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - access("F"."FK3"="D"."ID")&lt;br /&gt;   3 - filter("IS_FLAG_D3"='Y')&lt;br /&gt;   4 - access("F"."FK1"="D"."ID")&lt;br /&gt;   5 - filter("IS_FLAG_D1"='Y')&lt;br /&gt;   6 - access("F"."FK2"="D"."ID")&lt;br /&gt;   7 - filter("IS_FLAG_D2"='Y')&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So clearly the optimizer got it quite right - the join to the dimensions is not going to filter out significantly - the slight reduction in rows comes from the calculations based on the histograms generated.&lt;br /&gt;&lt;br /&gt;But now try the same again with star transformation enabled:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&gt; explain plan for&lt;br /&gt;  2  select /*+ star_transformation opt_param('star_transformation_enabled', 'temp_disable') */&lt;br /&gt;  3         count(*)&lt;br /&gt;  4  from&lt;br /&gt;  5         t f&lt;br /&gt;  6       , (select * from d where is_flag_d1 = 'Y') d1&lt;br /&gt;  7       , (select * from d where is_flag_d2 = 'Y') d2&lt;br /&gt;  8       , (select * from d where is_flag_d3 = 'Y') d3&lt;br /&gt;  9  where&lt;br /&gt; 10         f.fk1 = d1.id&lt;br /&gt; 11  and    f.fk2 = d2.id&lt;br /&gt; 12  and    f.fk3 = d3.id&lt;br /&gt; 13  ;&lt;br /&gt;&lt;br /&gt;Explained.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select * from table(dbms_xplan.display(format =&gt; 'BASIC +ROWS +PREDICATE'));&lt;br /&gt;Plan hash value: 459231705&lt;br /&gt;&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;| Id  | Operation                        | Name  | Rows  |&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT                 |       |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE                  |       |     1 |&lt;br /&gt;|*  2 |   HASH JOIN                      |       |     9 |&lt;br /&gt;|*  3 |    HASH JOIN                     |       |     9 |&lt;br /&gt;|*  4 |     HASH JOIN                    |       |    10 |&lt;br /&gt;|*  5 |      TABLE ACCESS FULL           | D     |    10 |&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;|   6 |      TABLE ACCESS BY INDEX ROWID | T     |    10 |&lt;/span&gt;&lt;br /&gt;|   7 |       BITMAP CONVERSION TO ROWIDS|       |       |&lt;br /&gt;|   8 |        BITMAP AND                |       |       |&lt;br /&gt;|   9 |         BITMAP MERGE             |       |       |&lt;br /&gt;|  10 |          BITMAP KEY ITERATION    |       |       |&lt;br /&gt;|* 11 |           TABLE ACCESS FULL      | D     |   100 |&lt;br /&gt;|* 12 |           BITMAP INDEX RANGE SCAN| T_FK1 |       |&lt;br /&gt;|  13 |         BITMAP MERGE             |       |       |&lt;br /&gt;|  14 |          BITMAP KEY ITERATION    |       |       |&lt;br /&gt;|* 15 |           TABLE ACCESS FULL      | D     |   100 |&lt;br /&gt;|* 16 |           BITMAP INDEX RANGE SCAN| T_FK3 |       |&lt;br /&gt;|  17 |         BITMAP MERGE             |       |       |&lt;br /&gt;|  18 |          BITMAP KEY ITERATION    |       |       |&lt;br /&gt;|* 19 |           TABLE ACCESS FULL      | D     |    10 |&lt;br /&gt;|* 20 |           BITMAP INDEX RANGE SCAN| T_FK2 |       |&lt;br /&gt;|* 21 |     TABLE ACCESS FULL            | D     |   100 |&lt;br /&gt;|* 22 |    TABLE ACCESS FULL             | D     |   100 |&lt;br /&gt;----------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - access("F"."FK3"="D"."ID")&lt;br /&gt;   3 - access("F"."FK1"="D"."ID")&lt;br /&gt;   4 - access("F"."FK2"="D"."ID")&lt;br /&gt;   5 - filter("IS_FLAG_D2"='Y')&lt;br /&gt;  11 - filter("IS_FLAG_D1"='Y')&lt;br /&gt;  12 - access("F"."FK1"="D"."ID")&lt;br /&gt;  15 - filter("IS_FLAG_D3"='Y')&lt;br /&gt;  16 - access("F"."FK3"="D"."ID")&lt;br /&gt;  19 - filter("IS_FLAG_D2"='Y')&lt;br /&gt;  20 - access("F"."FK2"="D"."ID")&lt;br /&gt;  21 - filter("IS_FLAG_D1"='Y')&lt;br /&gt;  22 - filter("IS_FLAG_D3"='Y')&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;What an astonishing result: Not only Oracle will try now to access all rows of the fact table by single-block random I/O, which by itself can be a disaster for larger real-life fact tables, in particular when dealing with Exadata features like Smart Scans which are only possible with multi-block direct-path reads, but furthermore if this was part of a more complex execution plan look at the cardinality estimates: They are off by five orders of magnitude - very likely a receipt for disaster for any step following afterwards.&lt;br /&gt;&lt;br /&gt;The point here is simple: The Star Transformation calculation model obviously doesn't cope with the "collection" of dimensions in a single table very well, but assumes a dimensional model where each dimension is stored in separate table(s). If you don't adhere to that model the calculation will be badly wrong and the results possibly disastrous.&lt;br /&gt;&lt;br /&gt;Here the Star Transformation assumes a filtering on dimension tables that are effectively no filter but this is something the current calculation model is not aware of. If you put the three dimensions in separate tables no "artificial" filter is required and hence the calculation won't be mislead.&lt;br /&gt;&lt;br /&gt;Of course one could argue that the star transformation optimization seems to do a poor job since the normal optimization based on the same input data produces a much better estimate, but at least for the time being that's the way this transformation works and the model chosen better reflects this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-2766024401207981097?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/2766024401207981097/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=2766024401207981097' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/2766024401207981097'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/2766024401207981097'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/11/star-transformation-and-cardinality.html' title='Star Transformation And Cardinality Estimates'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-6054944258582229571</id><published>2011-11-10T23:10:00.006+01:00</published><updated>2011-11-11T16:14:09.166+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Public Appearance'/><category scheme='http://www.blogger.com/atom/ns#' term='Advert'/><title type='text'>Public Appearances</title><content type='html'>I've just got the confirmation that I have been accepted as speaker for the HotSOS Symposium 2012 in March next year, therefore I post here a quick update on my upcoming public appearances:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;1. DOAG&lt;/span&gt;&lt;br /&gt;Next week the &lt;a href="http://www.doag.org/events/konferenzen/doag-2011.html"&gt;DOAG conference 2011&lt;/a&gt; (Nürnberg, Germany) will begin. It's an impressive conference with a large number of tracks, although a lot of them are not database-centred.&lt;br /&gt;&lt;br /&gt;Nevertheless it's certainly one of the conferences to go if you speak German and are interested in Oracle database technology (or the ever increasing range of Oracle technology in general).&lt;br /&gt;&lt;br /&gt;Since I've again missed to negotiate a training day this year (something I hopefully will be able to do next year) I've decided to compensate this a little bit: I plan to give at least two additional sessions at the &lt;a href="http://www.doag.org/events/konferenzen/doag-2011/das-programm/doag-2011-unconference.html"&gt;DOAG "Unconference"&lt;/a&gt;. This will be so called "Optimizer hacking sessions" where I simply start up a SQL prompt and explore some of the most important aspects of the Cost Based Optimizer. The session's title is: "Optimizer issues - How to detect and prevent suboptimal execution plans". I hope this will be fun and educating for all of  us - it is meant to be an interactive session where you're encouraged to participate by asking questions that we will try to answer by performing live exploration of the database. I've got plenty of material to talk about, starting from basic SQL statement performance troubleshooting and not ending with more advanced topics like histograms, cardinality estimates, virtual columns, extended statistics, clustering factor etc. etc. Furthermore I plan to demonstrate some cool things that you probably haven't seen yet, so I believe it's going to be a lot of fun.&lt;br /&gt;&lt;br /&gt;If you've already received the print out with the schedule (I never understand why DOAG publishes these print outs that early given all the possible changes to the schedule until the actual start of the conference): I agreed with Heli Helskyaho from Finland when we met at this year's OOW to swap our presentation slots at DOAG, so my presentation about the Cost-Based Optimizer called "Query Transformations" will &lt;span style="font-weight:bold;"&gt;not&lt;/span&gt; take place on Wednesday, 9 am, but on &lt;span style="font-weight:bold;"&gt;Thursday, 3 pm, room "Kiew"&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Unfortunately DOAG didn't accept the second paper I've submitted that would have been the "introductionary" session about how to read and understand execution plans, so I'll only give this more advanced session about the most important transformations that the optimizer applies to a query, why you should care and how to control them if necessary.&lt;br /&gt;&lt;br /&gt;So this is what my preliminary schedule for DOAG looks like:&lt;br /&gt;&lt;br /&gt;Thursday, November 17&lt;br /&gt;12 pm: DOAG Unconference - Optimizer hacking session&lt;br /&gt;2  pm: DOAG Unconference - Optimizer hacking session&lt;br /&gt;3  pm: DOAG Conference - Query Transformations, room "Kiew"&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;2. UKOUG&lt;/span&gt;&lt;br /&gt;I &lt;span style="font-weight:bold;"&gt;won't&lt;/span&gt; be at the &lt;a href="http://techandebs.ukoug.org/"&gt;UKOUG conference&lt;/a&gt; (December 5-7, UK, Birmingham) this year but this doesn't mean that I don't recommend going there. In fact I believe this year's conference has one of the most impressive list of speakers ever, including a number of top speakers from Oracle USA that you won't meet anywhere else across Europe, so if you can, get there - it certainly will be a top experience.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;3. HotSOS&lt;/span&gt;&lt;br /&gt;In March 2012 the 10th &lt;a href="http://www.hotsos.com/sym12.html"&gt;HotSOS Symposium&lt;/a&gt; (Dallas, Texas) will be held - and I'll be speaking there for the first time (about the CBO and how it calculates joins over histograms, by the way). It's a conference that I particularly look forward to for several reasons. &lt;br /&gt;This year is the 10th anniversary of the conference so some special activities are planned and since this is the only Oracle conference in the world dedicated to performance naturally the OakTable member "density" will be extremely high.&lt;br /&gt;&lt;br /&gt;I hope to meet some of you at one of those events (well except UKOUG, that is) !&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-6054944258582229571?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/6054944258582229571/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=6054944258582229571' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/6054944258582229571'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/6054944258582229571'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/11/public-appearances.html' title='Public Appearances'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-1461480296861504740</id><published>2011-10-30T18:22:00.001+01:00</published><updated>2011-10-30T18:25:01.410+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Insert'/><category scheme='http://www.blogger.com/atom/ns#' term='Parallel Execution'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='Auto-DOP'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='direct path'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='Parallel DML'/><title type='text'>Auto DOP And Direct-Path Inserts</title><content type='html'>This is just a short note about one of the potential side-effects of the new Auto Degree Of Parallelism (DOP) feature introduced in 11.2.&lt;br /&gt;&lt;br /&gt;If you happen to have Parallel DML enabled in your session along with Auto DOP (and here I refer to the PARALLEL_DEGREE_POLICY = AUTO setting, not LIMITED) then it might take you by surprise that INSERT statements that are neither decorated with a parallel hint nor use any parallel enabled objects can be turned into direct-path inserts.&lt;br /&gt;&lt;br /&gt;Now don't get me wrong - I think this is reasonable and in-line with the behaviour so far because you have enabled parallel DML and Auto DOP therefore is eligible to make use of that feature. According to the documentation the default mode of parallel inserts is direct-path, so Auto DOP simply follows the documented behaviour when deciding to use parallel DML. Note that depending on the data volume to insert you can even end up with a serial direct-path insert combined with a parallel query part.&lt;br /&gt;&lt;br /&gt;It is just that you need to be aware of the fact that a simple INSERT INTO ... SELECT FROM on serial objects might turn into a direct-path insert.&lt;br /&gt;&lt;br /&gt;The main caveat to watch out for is that the direct-path insert won't re-use any space available in the existing blocks of the segment but will always allocate blocks above the current High Water Mark (HWM).&lt;br /&gt;&lt;br /&gt;So if you use this feature along with some application logic that deletes rows from a segment then by enabling Auto DOP you might end up with an unreasonable segment growth that can have all kinds of nasty side effects.&lt;br /&gt;&lt;br /&gt;Another side effect of this is more obvious: An existing application logic might break because it attempts to re-access the object after the now direct-path insert within the the same transaction which will end up with an "ORA-12838: cannot read/modify an object after modifying it in parallel".&lt;br /&gt;&lt;br /&gt;If you still want to make use of parallel DML but need to be able to re-use available space in existing blocks you can try to explicitly specify the &lt;span style="font-weight:bold;"&gt;NOAPPEND hint&lt;/span&gt; that still allows parallel AUTO to be used but will prevent the direct-path insert mode for both serial and parallel inserts - 11g introduced the &lt;a href="http://oracle-randolf.blogspot.com/2011/02/parallel-dml-conventional-non-direct.html"&gt;parallel conventional insert&lt;/a&gt;, by the way.&lt;br /&gt;&lt;br /&gt;Here is a small test case to demonstrate the behaviour:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;set echo on linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;&lt;br /&gt;drop table t;&lt;br /&gt;&lt;br /&gt;purge table t;&lt;br /&gt;&lt;br /&gt;create table t&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;      , rpad('x', 100) as vc1&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by level &lt;= 1000000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 'T')&lt;br /&gt;&lt;br /&gt;alter session enable parallel dml;&lt;br /&gt;&lt;br /&gt;alter session set parallel_degree_policy = manual;&lt;br /&gt;&lt;br /&gt;insert into t select * from t where rownum &lt;= 1000;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;&lt;br /&gt;select count(*) from t;&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;alter session set parallel_degree_policy = auto;&lt;br /&gt;&lt;br /&gt;insert into t select * from t where rownum &lt;= 1000;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;&lt;br /&gt;select count(*) from t;&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;insert /*+ noappend */ into t select * from t where rownum &lt;= 1000;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;&lt;br /&gt;select count(*) from t;&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;And here is the output I get from 11.2.0.1:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;Connected to:&lt;br /&gt;Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production&lt;br /&gt;With the Partitioning, OLAP, Data Mining and Real Application Testing options&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; drop table t;&lt;br /&gt;&lt;br /&gt;Table dropped.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; purge table t;&lt;br /&gt;&lt;br /&gt;Table purged.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; create table t&lt;br /&gt;  2  as&lt;br /&gt;  3  select&lt;br /&gt;  4          rownum as id&lt;br /&gt;  5        , rpad('x', 100) as vc1&lt;br /&gt;  6  from&lt;br /&gt;  7          dual&lt;br /&gt;  8  connect by level &lt;= 1000000&lt;br /&gt;  9  ;&lt;br /&gt;&lt;br /&gt;Table created.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; exec dbms_stats.gather_table_stats(null, 'T')&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; alter session enable parallel dml;&lt;br /&gt;&lt;br /&gt;Session altered.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; alter session set parallel_degree_policy = manual;&lt;br /&gt;&lt;br /&gt;Session altered.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; insert into t select * from t where rownum &lt;= 1000;&lt;br /&gt;&lt;br /&gt;1000 rows created.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;SQL_ID  95x60r5k6mhka, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;insert into t select * from t where rownum &lt;= 1000&lt;br /&gt;&lt;br /&gt;Plan hash value: 508354683&lt;br /&gt;&lt;br /&gt;---------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;---------------------------------------------------------------------------------&lt;br /&gt;|   0 | INSERT STATEMENT         |      |       |       |  4200 (100)|          |&lt;br /&gt;|   1 |  LOAD TABLE CONVENTIONAL |      |       |       |            |          |&lt;br /&gt;|*  2 |   COUNT STOPKEY          |      |       |       |            |          |&lt;br /&gt;|   3 |    TABLE ACCESS FULL     | T    |  1000K|   101M|  4200   (1)| 00:00:51 |&lt;br /&gt;---------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&lt;=1000)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;20 rows selected.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select count(*) from t;&lt;br /&gt;   1001000&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; commit;&lt;br /&gt;&lt;br /&gt;Commit complete.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; alter session set parallel_degree_policy = auto;&lt;br /&gt;&lt;br /&gt;Session altered.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; insert into t select * from t where rownum &lt;= 1000;&lt;br /&gt;&lt;br /&gt;1000 rows created.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;SQL_ID  95x60r5k6mhka, child number 2&lt;br /&gt;-------------------------------------&lt;br /&gt;insert into t select * from t where rownum &lt;= 1000&lt;br /&gt;&lt;br /&gt;Plan hash value: 482288532&lt;br /&gt;&lt;br /&gt;-----------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |&lt;br /&gt;-----------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | INSERT STATEMENT        |          |       |       |  2326 (100)|          |        |      |            |&lt;br /&gt;|   1 |  LOAD AS SELECT         |          |       |       |            |          |        |      |            |&lt;br /&gt;|*  2 |   COUNT STOPKEY         |          |       |       |            |          |        |      |            |&lt;br /&gt;|   3 |    PX COORDINATOR       |          |       |       |            |          |        |      |            |&lt;br /&gt;|   4 |     PX SEND QC (RANDOM) | :TQ10000 |  1000K|   101M|  2326   (1)| 00:00:28 |  Q1,00 | P-&gt;S | QC (RAND)  |&lt;br /&gt;|*  5 |      COUNT STOPKEY      |          |       |       |            |          |  Q1,00 | PCWC |            |&lt;br /&gt;|   6 |       PX BLOCK ITERATOR |          |  1000K|   101M|  2326   (1)| 00:00:28 |  Q1,00 | PCWC |            |&lt;br /&gt;|*  7 |        TABLE ACCESS FULL| T        |  1000K|   101M|  2326   (1)| 00:00:28 |  Q1,00 | PCWP |            |&lt;br /&gt;-----------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&lt;=1000)&lt;br /&gt;   5 - filter(ROWNUM&lt;=1000)&lt;br /&gt;   7 - access(:Z&gt;=:Z AND :Z&lt;=:Z)&lt;br /&gt;&lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - automatic DOP: Computed Degree of Parallelism is 2&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;30 rows selected.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select count(*) from t;&lt;br /&gt;select count(*) from t&lt;br /&gt;                     *&lt;br /&gt;ERROR at line 1:&lt;br /&gt;ORA-12838: cannot read/modify an object after modifying it in parallel&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; commit;&lt;br /&gt;&lt;br /&gt;Commit complete.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; insert /*+ noappend */ into t select * from t where rownum &lt;= 1000;&lt;br /&gt;&lt;br /&gt;1000 rows created.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select * from table(dbms_xplan.display_cursor(null, null));&lt;br /&gt;SQL_ID  dv79tpggm6q4k, child number 1&lt;br /&gt;-------------------------------------&lt;br /&gt;insert /*+ noappend */ into t select * from t where rownum &lt;= 1000&lt;br /&gt;&lt;br /&gt;Plan hash value: 2717876046&lt;br /&gt;&lt;br /&gt;------------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |&lt;br /&gt;------------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | INSERT STATEMENT         |          |       |       |  2326 (100)|          |        |      |            |&lt;br /&gt;|   1 |  LOAD TABLE CONVENTIONAL |          |       |       |            |          |        |      |            |&lt;br /&gt;|*  2 |   COUNT STOPKEY          |          |       |       |            |          |        |      |            |&lt;br /&gt;|   3 |    PX COORDINATOR        |          |       |       |            |          |        |      |            |&lt;br /&gt;|   4 |     PX SEND QC (RANDOM)  | :TQ10000 |  1000K|   101M|  2326   (1)| 00:00:28 |  Q1,00 | P-&gt;S | QC (RAND)  |&lt;br /&gt;|*  5 |      COUNT STOPKEY       |          |       |       |            |          |  Q1,00 | PCWC |            |&lt;br /&gt;|   6 |       PX BLOCK ITERATOR  |          |  1000K|   101M|  2326   (1)| 00:00:28 |  Q1,00 | PCWC |            |&lt;br /&gt;|*  7 |        TABLE ACCESS FULL | T        |  1000K|   101M|  2326   (1)| 00:00:28 |  Q1,00 | PCWP |            |&lt;br /&gt;------------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&lt;=1000)&lt;br /&gt;   5 - filter(ROWNUM&lt;=1000)&lt;br /&gt;   7 - access(:Z&gt;=:Z AND :Z&lt;=:Z)&lt;br /&gt;&lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - automatic DOP: Computed Degree of Parallelism is 2&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;30 rows selected.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select count(*) from t;&lt;br /&gt;   1003000&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; commit;&lt;br /&gt;&lt;br /&gt;Commit complete.&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Notice how the insert turns into a direct-path insert with Auto DOP and how the subsequent query fails.&lt;br /&gt;&lt;br /&gt;As already mentioned, the automatic conversion to direct-path insert with Auto DOP can only been seen when Parallel DML is enabled in the session.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-1461480296861504740?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/1461480296861504740/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=1461480296861504740' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/1461480296861504740'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/1461480296861504740'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/10/auto-dop-and-direct-path-inserts.html' title='Auto DOP And Direct-Path Inserts'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-5174672501196913692</id><published>2011-10-17T20:39:00.004+02:00</published><updated>2011-10-18T20:44:37.235+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Global Temporary Tables'/><category scheme='http://www.blogger.com/atom/ns#' term='Cursor Sharing'/><category scheme='http://www.blogger.com/atom/ns#' term='Cardinality Feedback'/><category scheme='http://www.blogger.com/atom/ns#' term='FGAC'/><category scheme='http://www.blogger.com/atom/ns#' term='hard parse'/><category scheme='http://www.blogger.com/atom/ns#' term='RLS'/><category scheme='http://www.blogger.com/atom/ns#' term='Adaptive Cursor Sharing'/><category scheme='http://www.blogger.com/atom/ns#' term='VPD'/><category scheme='http://www.blogger.com/atom/ns#' term='dynamic sampling'/><title type='text'>Volatile Data, Dynamic Sampling And Shared Cursors</title><content type='html'>For the next couple of weeks I'll be picking up various random notes I've made during the sessions that I've attended at OOW. This particular topic was also a problem discussed recently at one of my clients, so it's certainly worth to be published here.&lt;br /&gt;&lt;br /&gt;In one of the optimizer related sessions it was mentioned that for highly volatile data - for example often found in Global Temporary Tables (GTT) - it's recommended to use Dynamic Sampling rather than attempting to gather statistics. In particular for GTTs gathering statistics is problematic because the statistics are used globally and shared across all sessions. But GTTs could have a completely different data volume and distribution per session so sharing the statistics doesn't make sense in such scenarios.&lt;br /&gt;&lt;br /&gt;So using Dynamic Sampling sounds like a reasonable advice and it probably is in many such cases.&lt;br /&gt;&lt;br /&gt;However, there is still a potential problem even when resorting to Dynamic Sampling. If the cursors based on Dynamic Sampling get shared between sessions then they won't be re-optimized even if a GTT in one session is completely different from the one of the session that created the shared cursor previously.&lt;br /&gt;&lt;br /&gt;So you can still end up with shared cursors and execution plans that are inappropriate to share across the different sessions. Using Dynamic Sampling doesn't address this issue. It addresses the issue if the cursors do not get shared, for example if they use literals and these literals differ so that different cursors will be generated based on the text matching.&lt;br /&gt;&lt;br /&gt;Here is a simple test case that demonstrates the problem:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;drop view v_gtt_dyn;&lt;br /&gt;&lt;br /&gt;drop table gtt_dyn;&lt;br /&gt;&lt;br /&gt;-- Create a Global Temporary Table with an index on it&lt;br /&gt;create global temporary table gtt_dyn (&lt;br /&gt;  id     number not null&lt;br /&gt;, vc1    varchar2(100)&lt;br /&gt;, filler varchar2(255)&lt;br /&gt;)&lt;br /&gt;on commit preserve rows&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;create index gtt_dyn_idx on gtt_dyn (id);&lt;br /&gt;&lt;br /&gt;-- Create a simple view - it will become obvious later&lt;br /&gt;-- why this has been used&lt;br /&gt;create or replace view v_gtt_dyn as select * from gtt_dyn;&lt;br /&gt;&lt;br /&gt;-- Run in Session 1&lt;br /&gt;set echo on timing on&lt;br /&gt;&lt;br /&gt;-- Unique value in ID column&lt;br /&gt;insert into gtt_dyn&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;      , rpad('x', 100) as vc1&lt;br /&gt;      , rpad('y', 255) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by level &lt;= 10000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;alter session set statistics_level = all;&lt;br /&gt;&lt;br /&gt;alter session set optimizer_dynamic_sampling = 2;&lt;br /&gt;&lt;br /&gt;select * from v_gtt_dyn where id = 10 and rownum &gt; 1;&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));&lt;br /&gt;&lt;br /&gt;-- Run in Session 2&lt;br /&gt;set echo on timing on&lt;br /&gt;&lt;br /&gt;-- Single value in ID column&lt;br /&gt;insert into gtt_dyn&lt;br /&gt;select&lt;br /&gt;        10 as id&lt;br /&gt;      , rpad('x', 100) as vc1&lt;br /&gt;      , rpad('y', 255) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by level &lt;= 10000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;alter session set statistics_level = all;&lt;br /&gt;&lt;br /&gt;alter session set optimizer_dynamic_sampling = 2;&lt;br /&gt;&lt;br /&gt;select * from v_gtt_dyn where id = 10 and rownum &gt; 1;&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Now this is probably an extreme case of data distribution differences but if you run it you'll see it makes the point obvious: In the second session the data distribution of the GTT is completely different, and although there are no statistics gathered on the GTT and hence Dynamic Sampling gets used to arrive at an execution plan, the plan gets shared in the second session (there is only a child number 0) - but the plan is completely inappropriate for the data distribution of the GTT in the that session, you just need to look at the E-Rows and A-Rows columns of the runtime profile:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&gt; select * from v_gtt_dyn where id = 10 and rownum &gt; 1;&lt;br /&gt;&lt;br /&gt;no rows selected&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.07&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; set linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));&lt;br /&gt;SQL_ID  bjax3mksw1uza, child number 0&lt;br /&gt;-------------------------------------&lt;br /&gt;select * from v_gtt_dyn where id = 10 and rownum &gt; 1&lt;br /&gt;&lt;br /&gt;Plan hash value: 471827990&lt;br /&gt;&lt;br /&gt;-------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |&lt;br /&gt;-------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |             |      1 |        |      0 |00:00:00.08 |    1117 |&lt;br /&gt;|   1 |  COUNT                        |             |      1 |        |      0 |00:00:00.08 |    1117 |&lt;br /&gt;|*  2 |   FILTER                      |             |      1 |        |      0 |00:00:00.08 |    1117 |&lt;br /&gt;|   3 |    TABLE ACCESS BY INDEX ROWID| GTT_DYN     |      1 |      1 |  10000 |00:00:00.06 |    1117 |&lt;br /&gt;|*  4 |     INDEX RANGE SCAN          | GTT_DYN_IDX |      1 |      1 |  10000 |00:00:00.02 |      63 |&lt;br /&gt;-------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&gt;1)&lt;br /&gt;   4 - access("ID"=10)&lt;br /&gt;&lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - dynamic sampling used for this statement (level=2)&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Imagine a more complex plan with joins and a larger data volume and this is a receipt for disaster.&lt;br /&gt;&lt;br /&gt;If this problem cannot be addressed from application side by helping the database to generate different cursors for the different data distributions (for example by simply adding different predicates that don't change the result like 1 = 1, 2 = 2 etc.) then you might be able to handle the issue by using Virtual Private Database (VPD, aka. Row Level Security / RLS, Fine Grained Access Control / FGAC). I've already demonstrated the general approach in the past &lt;a href="http://oracle-randolf.blogspot.com/2009/02/how-to-force-hard-parse.html"&gt;here&lt;/a&gt;, but in this case a slightly more sophisticated approach could make sense.&lt;br /&gt;&lt;br /&gt;By adding the following code and RLS policy I can drive Oracle to perform a re-optimization only in those cases where it is appropriate. This limits the damage that the general approach does to the Shared Pool by generating potentially numerous child cursors unconditionally.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;create or replace package pkg_rls_force_hard_parse is&lt;br /&gt;  function force_hard_parse (in_schema varchar2, in_object varchar2) return varchar2;&lt;br /&gt;end pkg_rls_force_hard_parse;&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;create or replace package body pkg_rls_force_hard_parse is&lt;br /&gt;  -- Cache the count in session state&lt;br /&gt;  g_cnt number;&lt;br /&gt;&lt;br /&gt;  function force_hard_parse (in_schema varchar2, in_object varchar2) return varchar2&lt;br /&gt;  is&lt;br /&gt;    s_predicate varchar2(100);&lt;br /&gt;  begin&lt;br /&gt;    -- Only execute query once in session&lt;br /&gt;    -- Change if re-evaluation is desired&lt;br /&gt;    if g_cnt is null then&lt;br /&gt;      select&lt;br /&gt;              count(*)&lt;br /&gt;      into&lt;br /&gt;              g_cnt&lt;br /&gt;      from&lt;br /&gt;              gtt_dyn&lt;br /&gt;      where&lt;br /&gt;              id = 10&lt;br /&gt;      and     rownum &lt;= 10;&lt;br /&gt;    end if;&lt;br /&gt;&lt;br /&gt;    -- We end up with exactly two child cursors&lt;br /&gt;    -- with the desired different plans&lt;br /&gt;    -- These child cursors will be shared accordingly&lt;br /&gt;    if g_cnt &gt; 1 then&lt;br /&gt;      s_predicate := '1 = 1';&lt;br /&gt;    else&lt;br /&gt;      s_predicate := '2 = 2';&lt;br /&gt;    end if;&lt;br /&gt;&lt;br /&gt;    return s_predicate;&lt;br /&gt;  end force_hard_parse;&lt;br /&gt;end pkg_rls_force_hard_parse;&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;-- CONTEXT_SENSITIVE avoids re-evaluation of policy function at execution time&lt;br /&gt;-- Note however that it doesn't avoid re-evaluation at parse time&lt;br /&gt;exec DBMS_RLS.ADD_POLICY (USER, 'v_gtt_dyn', 'hard_parse_policy', USER, 'pkg_rls_force_hard_parse.force_hard_parse', 'select', policy_type =&gt; DBMS_RLS.CONTEXT_SENSITIVE);&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Now if you repeat above exercise - ideally with SQL trace enabled to see the additional acitivity caused by the RLS policy - you'll notice that the different sessions will end up with different child cursors and execution plans based on the check made.&lt;br /&gt;&lt;br /&gt;Now the reason why the view is in place might become obvious: A RLS policy on the base table would have lead to an infinite recursive execution of the RLS policy function due to the query performed within the function. There are other obvious options how to deal with that, for example storing the RLS policy function in a separate schema with the EXEMPT ACCESS POLICY privilege should also work.&lt;br /&gt;&lt;br /&gt;This is the result in the second session now:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;SQL&gt; select * from v_gtt_dyn where id = 10 and rownum &gt; 1;&lt;br /&gt;&lt;br /&gt;no rows selected&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:00.12&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; set linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;SQL&gt;&lt;br /&gt;SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));&lt;br /&gt;SQL_ID  bjax3mksw1uza, child number 1&lt;br /&gt;-------------------------------------&lt;br /&gt;select * from v_gtt_dyn where id = 10 and rownum &gt; 1&lt;br /&gt;&lt;br /&gt;Plan hash value: 424976618&lt;br /&gt;&lt;br /&gt;-----------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation           | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |&lt;br /&gt;-----------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |         |      1 |        |      0 |00:00:00.04 |    1003 |&lt;br /&gt;|   1 |  COUNT              |         |      1 |        |      0 |00:00:00.04 |    1003 |&lt;br /&gt;|*  2 |   FILTER            |         |      1 |        |      0 |00:00:00.04 |    1003 |&lt;br /&gt;|*  3 |    TABLE ACCESS FULL| GTT_DYN |      1 |   9288 |  10000 |00:00:00.03 |    1003 |&lt;br /&gt;-----------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;   2 - filter(ROWNUM&gt;1)&lt;br /&gt;   3 - filter("ID"=10)&lt;br /&gt;&lt;br /&gt;Note&lt;br /&gt;-----&lt;br /&gt;   - dynamic sampling used for this statement (level=2)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Notice how a second child cursor has been generated and that the cardinality estimate is now much closer to the reality.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Adaptive Cursor Sharing / Cardinality Feedback&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I was curious to see if recent features like Adaptive Cursor Sharing or Cardinality Feedback would be able to solve the issue when using the 11g releases.&lt;br /&gt;&lt;br /&gt;Cardinality Feedback (introduced in 11.2) unfortunately doesn't get used in the scenario described here, because Dynamic Sampling disables Cardinality Feedback in the current implementation.&lt;br /&gt;&lt;br /&gt;Note that the usage of bind variables also disables Cardinality Feedback for those parts of the plan affected by the bind variables - as described in the Optimizer blog post that can be found &lt;a href="http://blogs.oracle.com/optimizer/entry/cardinality_feedback"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So may be Adaptive Cursor Sharing (ACS, introduced in 11.1) can come to rescue in case bind variables get used.&lt;br /&gt;&lt;br /&gt;Of course the usage of bind variables increases the probability of cursor sharing in above scenario. As already outlined in a &lt;a href="http://oracle-randolf.blogspot.com/2011/01/adaptive-cursor-sharing.html"&gt;previous note&lt;/a&gt; ACS is a "reactive" and "non-persistent" feature, so it will only be able to correct things that have already been going wrong at least once. Furthermore if the ACS information gets aged out of the Shared Pool again mistakes will have to be repeated to get recognized by ACS.&lt;br /&gt;&lt;br /&gt;However it is interesting to note that I wasn't able to get ACS working in a slightly modified scenario like this (without the RLS policy in place of course):&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;-- Session 1&lt;br /&gt;set echo on timing on&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;&lt;br /&gt;insert into gtt_dyn&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;      , rpad('x', 100) as vc1&lt;br /&gt;      , rpad('y', 255) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by level &lt;= 10000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;alter session set statistics_level = all;&lt;br /&gt;&lt;br /&gt;variable n1 number&lt;br /&gt;&lt;br /&gt;exec :n1 := 10&lt;br /&gt;&lt;br /&gt;select * from v_gtt_dyn where id &lt;= :n1 and rownum &gt; 1;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));&lt;br /&gt;&lt;br /&gt;-- Session 2&lt;br /&gt;set echo on timing on&lt;br /&gt;&lt;br /&gt;set linesize 200 pagesize 0 trimspool on tab off&lt;br /&gt;&lt;br /&gt;insert into gtt_dyn&lt;br /&gt;select&lt;br /&gt;        10 as id&lt;br /&gt;      , rpad('x', 100) as vc1&lt;br /&gt;      , rpad('y', 255) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by level &lt;= 10000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;alter session set statistics_level = all;&lt;br /&gt;&lt;br /&gt;variable n1 number&lt;br /&gt;&lt;br /&gt;exec :n1 := 10&lt;br /&gt;&lt;br /&gt;select * from v_gtt_dyn where id &lt;= :n1 and rownum &gt; 1;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));&lt;br /&gt;&lt;br /&gt;-- Second execution to allow ACS potentially kicking in&lt;br /&gt;select * from v_gtt_dyn where id &lt;= :n1 and rownum &gt; 1;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;There are some interesting points to notice when running this example:&lt;br /&gt;&lt;br /&gt;1. A cursor that uses non-equal operators like above less or equal together with bind variables usually gets marked as "bind-sensitive" and will be monitored by ACS. But in the above case the cursor was not marked as bind sensitive and hence ACS didn't even bother to monitor&lt;br /&gt;&lt;br /&gt;2. Consequently the two sessions share the single child cursor and the problem is not addressed by ACS even in subsequent executions&lt;br /&gt;&lt;br /&gt;3. It looks like that again the usage of Dynamic Sampling disables ACS&lt;br /&gt;&lt;br /&gt;Looking at the way ACS manages the Cursor Sharing criteria (check V$SQL_CS_SELECTIVITY for example) I see the problem that ACS probably wouldn't support the fact that the same value for the bind variable resulted in a completely different selectivity range.&lt;br /&gt;&lt;br /&gt;May be this is an explanation why ACS is not activated for cursors that use Dynamic Sampling - ACS may only be able to cope with different bind value ranges that lead to different selectivities.&lt;br /&gt;&lt;br /&gt;So even when using bind variables and 11g with ACS it looks like that only the RLS policy approach allows to address this issue from a database-only side. Ideally the application should be "data-aware" in such cases and help the database accordingly to arrive at reasonable execution plans by actively unsharing the cursors.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-5174672501196913692?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/5174672501196913692/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=5174672501196913692' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/5174672501196913692'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/5174672501196913692'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/10/volatile-data-dynamic-sampling-and.html' title='Volatile Data, Dynamic Sampling And Shared Cursors'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-2855013164077339061</id><published>2011-10-11T20:43:00.003+02:00</published><updated>2011-10-11T22:10:27.693+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='troubleshooting'/><category scheme='http://www.blogger.com/atom/ns#' term='Parallel Execution'/><category scheme='http://www.blogger.com/atom/ns#' term='Fundamentals'/><title type='text'>Parallel Downgrade</title><content type='html'>There are many reasons why a parallel execution might not run with the expected degree of parallelism (DOP), beginning with running out of parallel slaves (PARALLEL_MAX_SERVERS or PROCESSES reached), PARALLEL_ADAPTIVE_MULTI_USER, downgrades at execution time via the Resource Manager, or the more recent features like PARALLEL_DEGREE_LIMIT or the Auto DOP introduced in Oracle 11.2.&lt;br /&gt;&lt;br /&gt;However what do you do if you've already checked all these possibilities but still see a downgrade occurring? You can always enable the parallel execution tracing facility (see for example the MOS document ID 444164.1 "Tracing Parallel Execution with _px_trace. Part I" for details how to use it) via the "_px_trace" parameter in the session, and if you see there that parallel slaves are getting acquired but released again immediately then possibly followed by an error message raised then you might want to have a look at the ancient Profile setting SESSIONS_PER_USER. This setting is probably mostly known and used to limit the number of concurrent sessions that a particular user is able to perform, but it is probably forgotten or mostly unknown that this profile setting also will be respected by the parallel execution and each parallel slave started will count towards this limit. Actually up to Oracle 9.2.0.7 you could end up with an ORA-12805 (parallel query server died unexpectedly) error in such a case rather then seeing a downgrade occurring as described in bug 4041253.&lt;br /&gt;&lt;br /&gt;So the next time you see an otherwise unexplainable downgrade or think about using the SESSIONS_PER_USER Profile limit, and the user is supposed to make use of Parallel Execution, consider those implications.&lt;br /&gt;&lt;br /&gt;Sample px_trace snippet from 10.2.0.5 when downgrading to serial due to SESSIONS_PER_USER Profile setting:&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;kxfrSysInfo&lt;br /&gt;        DOP trace -- compute default DOP from system info&lt;br /&gt;        # instance alive  = 1 (kxfrsnins)&lt;br /&gt;kxfrDefaultDOP&lt;br /&gt;        DOP Trace -- compute default DOP&lt;br /&gt;            # CPU       = 4&lt;br /&gt;            Threads/CPU = 2 ("parallel_threads_per_cpu")&lt;br /&gt;            default DOP = 8 (# CPU * Threads/CPU)&lt;br /&gt;            default DOP = 8 (DOP * # instance)&lt;br /&gt;kxfrSysInfo&lt;br /&gt;        system default DOP = 8 (from kxfrDefaultDOP())&lt;br /&gt;kxfralo &lt;br /&gt;        DOP trace -- requested thread from best ref obj = 8 (from kxfrIsBestRef(&lt;br /&gt;        ))&lt;br /&gt;kxfralo &lt;br /&gt;        threads requested = 8 (from kxfrComputeThread())&lt;br /&gt;kxfralo &lt;br /&gt;        adjusted no. threads = 8 (from kxfrAdjustDOP())&lt;br /&gt;kxfralo &lt;br /&gt;        about to allocate 8 slaves&lt;br /&gt;kxfrAllocSlaves&lt;br /&gt;        DOP trace -- call kxfpgsg to get 8 slaves&lt;br /&gt;kxfpgsg &lt;br /&gt;        num server requested = 8&lt;br /&gt;        num server requested = 8 KXFPLDBL/KXFPADPT/ load balancing:on adaptive:o&lt;br /&gt;        n&lt;br /&gt;kxfpiinfo&lt;br /&gt;        inst[cpus:mxslv]&lt;br /&gt;        1[4:80] &lt;br /&gt;kxfpclinfo&lt;br /&gt;        inst(load:user:pct:fact)aff &lt;br /&gt;        1(1:0:100:400) &lt;br /&gt;kxfpAdaptDOP&lt;br /&gt;        Requested=8 Granted=8 Target=32 Load=1 Default=8 users=0 sets=1&lt;br /&gt;        load adapt num servers requested to = 8 (from kxfpAdaptDOP())&lt;br /&gt;kxfpgsg &lt;br /&gt;        getting 1 sets of 8 threads, client parallel query execution flg=0x30&lt;br /&gt;        Height=8, Affinity List Size=0, inst_total=1, coord=1&lt;br /&gt;        Insts     1&lt;br /&gt;        Threads   8&lt;br /&gt;kxfpg1sg&lt;br /&gt;        q:000007FF4656C058 req_threads:8 nthreads:8 #inst:1 normal&lt;br /&gt;kxfpg1srv&lt;br /&gt;        trying to get slave P000 on instance 1 for q:000007FF4656C058&lt;br /&gt;        slave P000 is local&lt;br /&gt;        found slave P000 dp:000007FF49682A98 flg:0 &lt;br /&gt;kxfpcre1&lt;br /&gt;        Creating slave 0 flg:30&lt;br /&gt;        free descriptor found dp:000007FF49680318&lt;br /&gt;        Allocated slave P000 dp:000007FF49680318 pnum:0 flg:4&lt;br /&gt;        Got It. 1 so far.&lt;br /&gt;kxfpg1srv&lt;br /&gt;        trying to get slave P001 on instance 1 for q:000007FF4656C058&lt;br /&gt;        slave P001 is local&lt;br /&gt;        found slave P001 dp:000007FF49682A98 flg:0 &lt;br /&gt;kxfpcre1&lt;br /&gt;        Creating slave 1 flg:30&lt;br /&gt;        free descriptor found dp:000007FF49680398&lt;br /&gt;        Allocated slave P001 dp:000007FF49680398 pnum:1 flg:4&lt;br /&gt;        Got It. 2 so far.&lt;br /&gt;kxfpg1srv&lt;br /&gt;        trying to get slave P002 on instance 1 for q:000007FF4656C058&lt;br /&gt;        slave P002 is local&lt;br /&gt;        found slave P002 dp:000007FF49682A98 flg:0 &lt;br /&gt;kxfpcre1&lt;br /&gt;        Creating slave 2 flg:30&lt;br /&gt;        free descriptor found dp:000007FF49680418&lt;br /&gt;        Allocated slave P002 dp:000007FF49680418 pnum:2 flg:4&lt;br /&gt;        Got It. 3 so far.&lt;br /&gt;kxfpg1srv&lt;br /&gt;        trying to get slave P003 on instance 1 for q:000007FF4656C058&lt;br /&gt;        slave P003 is local&lt;br /&gt;        found slave P003 dp:000007FF49682A98 flg:0 &lt;br /&gt;kxfpcre1&lt;br /&gt;        Creating slave 3 flg:30&lt;br /&gt;        free descriptor found dp:000007FF49680498&lt;br /&gt;        Allocated slave P003 dp:000007FF49680498 pnum:3 flg:4&lt;br /&gt;        Got It. 4 so far.&lt;br /&gt;kxfpg1srv&lt;br /&gt;        trying to get slave P004 on instance 1 for q:000007FF4656C058&lt;br /&gt;        slave P004 is local&lt;br /&gt;        found slave P004 dp:000007FF49682A98 flg:0 &lt;br /&gt;kxfpcre1&lt;br /&gt;        Creating slave 4 flg:30&lt;br /&gt;        free descriptor found dp:000007FF49680518&lt;br /&gt;        Allocated slave P004 dp:000007FF49680518 pnum:4 flg:4&lt;br /&gt;        Got It. 5 so far.&lt;br /&gt;kxfpg1srv&lt;br /&gt;        trying to get slave P005 on instance 1 for q:000007FF4656C058&lt;br /&gt;        slave P005 is local&lt;br /&gt;        found slave P005 dp:000007FF49682A98 flg:0 &lt;br /&gt;kxfpcre1&lt;br /&gt;        Creating slave 5 flg:30&lt;br /&gt;        free descriptor found dp:000007FF49680598&lt;br /&gt;        Allocated slave P005 dp:000007FF49680598 pnum:5 flg:4&lt;br /&gt;        Got It. 6 so far.&lt;br /&gt;kxfpg1srv&lt;br /&gt;        trying to get slave P006 on instance 1 for q:000007FF4656C058&lt;br /&gt;        slave P006 is local&lt;br /&gt;        found slave P006 dp:000007FF49682A98 flg:0 &lt;br /&gt;kxfpcre1&lt;br /&gt;        Creating slave 6 flg:30&lt;br /&gt;        free descriptor found dp:000007FF49680618&lt;br /&gt;        Allocated slave P006 dp:000007FF49680618 pnum:6 flg:4&lt;br /&gt;        Got It. 7 so far.&lt;br /&gt;kxfpg1srv&lt;br /&gt;        trying to get slave P007 on instance 1 for q:000007FF4656C058&lt;br /&gt;        slave P007 is local&lt;br /&gt;        found slave P007 dp:000007FF49682A98 flg:0 &lt;br /&gt;kxfpcre1&lt;br /&gt;        Creating slave 7 flg:30&lt;br /&gt;        free descriptor found dp:000007FF49680698&lt;br /&gt;        Allocated slave P007 dp:000007FF49680698 pnum:7 flg:4&lt;br /&gt;        Got It. 8 so far.&lt;br /&gt;kxfpqsrls&lt;br /&gt;        Release Slave q=0x000007FF4656C058 qr=0x000007FF465615A8 action=1 slave=&lt;br /&gt;        0 inst=1&lt;br /&gt;kxfpqsrls&lt;br /&gt;        Release Slave q=0x000007FF4656C058 qr=0x000007FF46561D68 action=1 slave=&lt;br /&gt;        2 inst=1&lt;br /&gt;kxfpqsrls&lt;br /&gt;        Release Slave q=0x000007FF4656C058 qr=0x000007FF46562148 action=1 slave=&lt;br /&gt;        3 inst=1&lt;br /&gt;kxfpqsrls&lt;br /&gt;        Release Slave q=0x000007FF4656C058 qr=0x000007FF46562CE8 action=1 slave=&lt;br /&gt;        4 inst=1&lt;br /&gt;kxfpqsrls&lt;br /&gt;        Release Slave q=0x000007FF4656C058 qr=0x000007FF4655FE68 action=1 slave=&lt;br /&gt;        5 inst=1&lt;br /&gt;kxfpqsrls&lt;br /&gt;        Release Slave q=0x000007FF4656C058 qr=0x000007FF46562ED8 action=1 slave=&lt;br /&gt;        6 inst=1&lt;br /&gt;kxfpqsrls&lt;br /&gt;        Release Slave q=0x000007FF4656C058 qr=0x000007FF46560058 action=1 slave=&lt;br /&gt;        7 inst=1&lt;br /&gt;kxfpg1sg&lt;br /&gt;        got 1 servers (sync), returning...&lt;br /&gt;kxfpgsg &lt;br /&gt;        serial - too few slaves alloc'd&lt;br /&gt;kxfpqsrls&lt;br /&gt;        Release Slave q=0x000007FF4656C058 qr=0x000007FF46561988 action=1 slave=&lt;br /&gt;        1 inst=1&lt;br /&gt;kxfplsig&lt;br /&gt;        signaling OER(10387) in serial 4609&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-2855013164077339061?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/2855013164077339061/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=2855013164077339061' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/2855013164077339061'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/2855013164077339061'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/10/parallel-downgrade.html' title='Parallel Downgrade'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-278693880263522102</id><published>2011-08-10T10:00:00.009+02:00</published><updated>2011-08-10T22:25:23.595+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Fundamentals'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='Batched I/O'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='11.1.0.7'/><category scheme='http://www.blogger.com/atom/ns#' term='Nested Loop Join Batching'/><category scheme='http://www.blogger.com/atom/ns#' term='Logical I/O'/><title type='text'>Logical I/O Evolution - Part 3: 11g</title><content type='html'>&lt;span style="font-weight:bold;"&gt;Preface &lt;/span&gt;(with apologies to &lt;a href="http://kevinclosson.wordpress.com/2011/07/20/i-can-see-clearly-now-exadata-is-better-than-emc-storage-i-have-seen-the-slides-part-ii-supercluster-storage/"&gt;Kevin Closson&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;This blog post is too long&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In the &lt;a href="http://oracle-randolf.blogspot.com/2011/07/logical-io-evolution-part-2-9i-10g.html"&gt;previous part&lt;/a&gt; of this series I've already demonstrated that the logical I/O optimization of the Table Prefetching feature depends on the order of the row sources - and 11g takes this approach a big step further.&lt;br /&gt;&lt;br /&gt;It is very interesting that 11g does not require any particular feature like Table Prefetching or Nested Loop Join Batching (another new feature introduced in 11g) to take advantage of the Logical I/O optimization - it seems to be available even with the most basic form of a Nested Loop join.&lt;br /&gt;&lt;br /&gt;Note that this optimization has already been &lt;a href="http://afatkulin.blogspot.com/2009/01/consistent-gets-from-cache-fastpath.html"&gt;mentioned&lt;/a&gt; &lt;a href="http://jonathanlewis.wordpress.com/2009/01/16/concurrency/"&gt;several&lt;/a&gt; &lt;a href="http://jonathanlewis.wordpress.com/2011/06/20/optimisation/"&gt;times&lt;/a&gt;, but there was always some confusion so far whether this optimization was related to another new feature that has been introduced with 11g - the so called "fastpath" consistent gets.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Buffer Pinning Optimization&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So, let's repeat the already known test case from the previous parts in 11g. Another nice feature of 11g is that we have now full control over the Nested Loop plan shapes / features used by Oracle - we can choose from "classic" Nested Loop Join, Table Prefetching and Nested Loop Join Batching.&lt;br /&gt;&lt;br /&gt;This is controlled via the [NO_]NLJ_BATCHING and [NO_]NLJ_PREFETCH hints which you will also find in the "outline" hint list generated for Plan Stability from 11g on.&lt;br /&gt;&lt;br /&gt;Interestingly if I wanted to have the "classic" Nested Loop shape then I couldn't achieve that by combining the NO_NLJ_BATCHING and NO_NLJ_PREFETCH hint - one seemed to disable the other one - so I had to resort to the "_nlj_batching_enabled" parameter to disable Nested Loop Join Batching.&lt;br /&gt;&lt;br /&gt;So this is what the query hints need to look like if we want to have the classic Nested Loop Join shape in 11g:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;select&lt;br /&gt;       max(b_filler), max(a_filler)&lt;br /&gt;from (&lt;br /&gt;select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */&lt;br /&gt;       a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler&lt;br /&gt;from&lt;br /&gt;       t2 a&lt;br /&gt;     , t1 b&lt;br /&gt;where&lt;br /&gt;       a.id = b.id&lt;br /&gt;);&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;If you want to test with different plan shapes you can simply modify the hint section as required, for example you can get the Table Prefetching shape by changing above hint from NO_NLJ_PREFETCH to NLJ_PREFETCH etc.&lt;br /&gt;&lt;br /&gt;Let's start with the data set where T1 and T2 are not in the same order, and stick to the classic plan shape:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;11.2.0.1 Classic Nested Loop - Random order&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Unique Index - T1 different order than T2&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:03.67 |     310K|&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:03.67 |     310K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:03.47 |     310K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T2     |      1 |    100K|  2720   (1)|    100K|00:00:00.21 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:02.54 |     300K|&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.76 |     200K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Non-Unique Index - T1 different order than T2&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:04.40 |     311K|&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:04.40 |     311K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:04.20 |     311K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T1     |      1 |    100K|  2720   (1)|    100K|00:00:00.20 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T2     |    100K|      1 |     2   (0)|    100K|00:00:03.28 |     301K|&lt;br /&gt;|*  5 |     INDEX RANGE SCAN          | T2_IDX |    100K|      1 |     1   (0)|    100K|00:00:01.08 |     201K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So far no difference to previous results, the non-unique index variant is still slower than the unique one, and we do not see any special buffer pinning optimization apart from the one we've already seen in the baseline test.&lt;br /&gt;&lt;br /&gt;The relevant session statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..table scan blocks gotten                          10,000      10,000           0&lt;br /&gt;STAT..table scan rows gotten                           100,000     100,000           0&lt;br /&gt;STAT..table fetch by rowid                             100,000     100,002           2&lt;br /&gt;STAT..consistent gets                                  310,012     311,108       1,096&lt;br /&gt;STAT..consistent gets from cache                       310,012     311,108       1,096&lt;br /&gt;STAT..session logical reads                            310,012     311,108       1,096&lt;br /&gt;STAT..buffer is not pinned count                       200,002     100,012     -99,990&lt;br /&gt;STAT..buffer is pinned count                            99,999     199,993      99,994&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..rows fetched via callback                        100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;STAT..consistent gets - examination                    300,001     100,007    -199,994&lt;br /&gt;STAT..consistent gets from cache (fastpath)             10,011     211,101     201,090&lt;br /&gt;STAT..no work - consistent read gets                    10,001     211,091     201,090&lt;br /&gt;LATCH.cache buffers chains                             320,024     522,216     202,192&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Nothing spectacular here either, but there are at least some interesting points to mention:&lt;br /&gt;&lt;br /&gt;- We can see that Oracle took advantage of the so called "fastpath" consistent gets for the "normal" consistent gets - they still took two latch acquisitions per get though. The "fastpath" seems to be about a code optimization when buffers get pinned that probably requires less CPU cycles. I don't know if the code change addresses any further contention/concurrency issues apart from being "faster" (faster is always better, isn't it :-)&lt;br /&gt;&lt;br /&gt;- The "buffer is pinned count" statistics are not consistent with what we've seen from 10g: &lt;br /&gt;&lt;br /&gt;* The "unique index" variant already misses 90,000 pins, but does not produce more consistent gets, so in total we do not arrive at the anticipated 500,000 buffer visits any more - either something seems to be missing from the instrumentation or Oracle does something fundamentally different&lt;br /&gt;* The "non-unique index" variant however records 10,000 excess pinned buffers, so we end up with 510,000 buffer visits recorded in total&lt;br /&gt;&lt;br /&gt;Let's repeat the same with the T1 and T2 data ordered in the same way - but not ordered by ID (so simply uncomment the second call to DBMS_RANDOM.SEED(0)):&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;11.2.0.1 Classic Nested Loop - Same random order&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Unique Index - T1 same random order as T2&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:03.55 |     310K|&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:03.55 |     310K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:03.35 |     310K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T2     |      1 |    100K|  2720   (1)|    100K|00:00:00.22 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:02.41 |     300K|&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.77 |     200K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Non-Unique Index - T1 same random order as T2&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:04.23 |     221K|&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:04.23 |     221K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:04.02 |     221K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T1     |      1 |    100K|  2720   (1)|    100K|00:00:00.21 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T2     |    100K|      1 |     2   (0)|    100K|00:00:03.10 |     211K|&lt;br /&gt;|*  5 |     INDEX RANGE SCAN          | T2_IDX |    100K|      1 |     1   (0)|    100K|00:00:01.09 |     201K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So, that's interesting: We can already see here the same optimization for the non-unique index kicking in as we saw in 10g with Table Prefetching, although the classic plan shape gets used.&lt;br /&gt;&lt;br /&gt;The statistics correspond to the result - but there is a slight difference to the 10.2 Table Prefetching case: The "buffer is pinned count" is at least "self-consistent" for the "non-unique index" variant, so there is no "excess" pinning recorded as with the Table Prefetching.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..table scan blocks gotten                          10,000      10,000           0&lt;br /&gt;STAT..table scan rows gotten                           100,000     100,000           0&lt;br /&gt;STAT..table fetch by rowid                             100,000     100,002           2&lt;br /&gt;LATCH.cache buffers chains                             320,030     342,242      22,212&lt;br /&gt;STAT..consistent gets                                  310,012     221,124     -88,888&lt;br /&gt;STAT..consistent gets from cache                       310,012     221,124     -88,888&lt;br /&gt;STAT..session logical reads                            310,012     221,124     -88,888&lt;br /&gt;STAT..Cached Commit SCN referenced                     110,000      20,007     -89,993&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..rows fetched via callback                        100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;STAT..consistent gets from cache (fastpath)             10,011     121,117     111,106&lt;br /&gt;STAT..no work - consistent read gets                    10,001     121,107     111,106&lt;br /&gt;STAT..buffer is not pinned count                       200,002      10,028    -189,974&lt;br /&gt;STAT..buffer is pinned count                            99,999     289,977     189,978&lt;br /&gt;STAT..consistent gets - examination                    300,001     100,007    -199,994&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Redundant Filter Optimization&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As I've just demonstrated the inner table lookup for the "unique index" variant does not use the buffer pinning optimization. It's an interesting little detail that in 11.1.0.7 and 11.2.0.1 putting a filter on the inner table lookup changes the result for the "unique index" variant, so running a query like this using a redundant filter that doesn't change the overall result:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;select&lt;br /&gt;       max(b_filler), max(a_filler)&lt;br /&gt;from (&lt;br /&gt;select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */&lt;br /&gt;       a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler&lt;br /&gt;from&lt;br /&gt;       t2 a&lt;br /&gt;     , t1 b&lt;br /&gt;where&lt;br /&gt;       a.id = b.id&lt;br /&gt;and    substr(b.filler, 1, 1) = 'x'&lt;br /&gt;);&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;will result in such an output:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:03.25 |     220K|&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:03.25 |     220K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |   1000 |   202K  (1)|    100K|00:00:03.05 |     220K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T2     |      1 |    100K|  2720   (1)|    100K|00:00:00.21 |   10010 |&lt;br /&gt;|*  4 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:02.12 |     210K|&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.77 |     200K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Session Statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..buffer is pinned count                           289,998     289,977         -21&lt;br /&gt;STAT..buffer is not pinned count                        10,003      10,028          25&lt;br /&gt;LATCH.JS queue state obj latch                               0          36          36&lt;br /&gt;LATCH.row cache objects                                     67         110          43&lt;br /&gt;STAT..CPU used when call started                            59         119          60&lt;br /&gt;STAT..DB time                                               59         119          60&lt;br /&gt;STAT..CPU used by this session                              56         119          63&lt;br /&gt;LATCH.enqueues                                               2          78          76&lt;br /&gt;LATCH.enqueue hash chains                                    3          80          77&lt;br /&gt;LATCH.simulator hash latch                               9,111       9,304         193&lt;br /&gt;STAT..consistent gets                                  220,012     221,124       1,112&lt;br /&gt;STAT..consistent gets from cache                       220,012     221,124       1,112&lt;br /&gt;STAT..session logical reads                            220,012     221,124       1,112&lt;br /&gt;STAT..consistent gets - examination                    200,001     100,007     -99,994&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;STAT..consistent gets from cache (fastpath)             20,011     121,117     101,106&lt;br /&gt;STAT..no work - consistent read gets                    20,001     121,107     101,106&lt;br /&gt;LATCH.cache buffers chains                             240,024     342,248     102,224&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The interesting part here is that the "unique index" variant now uses the same buffer pinning optimization as the "non-unique index" one - but resorts to "normal" consistent gets (using the "fastpath" version in this case) for the random table access.&lt;br /&gt;&lt;br /&gt;I don't know if this is feature or a side-effect of a bug because it ceases to work in 11.2.0.2 - there the "unique index" variant can not be convinced to make use of the "buffer pinning" optimization, it always performs the "shortcut" logical I/O on in the inner table lookup even with a filter specified.&lt;br /&gt;&lt;br /&gt;We'll see later on that this has some interesting consequences with concurrent executions.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Ordered Data Sets&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;OK, now finally the big one: Let's repeat the test case with data sorted by ID, so by using the ORDER BY ID instead of ORDER BY DBMS_RANDOM.VALUE when populating the tables:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;11.2.0.1 Classic Nested Loop - data ordered by ID&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Unique Index - T1 and T2 ordered by ID&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:03.42 |     122K|&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:03.42 |     122K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:03.21 |     122K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T2     |      1 |    100K|  2720   (1)|    100K|00:00:00.21 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:02.27 |     112K|&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.55 |   12314 |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Non-Unique Index - T1 and T2 ordered by ID&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:03.64 |   33143 |&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:03.64 |   33143 |&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:03.45 |   33143 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T1     |      1 |    100K|  2720   (1)|    100K|00:00:00.20 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T2     |    100K|      1 |     2   (0)|    100K|00:00:02.55 |   23133 |&lt;br /&gt;|*  5 |     INDEX RANGE SCAN          | T2_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.67 |   13126 |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The result is staggering: The "non-unique" index variant apparently manages to visit 500,000 buffers with just 33K logical I/Os. It is also almost as fast as the "unique index" variant that obviously does not keep the buffers pinned for the inner table random lookup - let's check the session statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..table scan blocks gotten                          10,000      10,000           0&lt;br /&gt;STAT..table scan rows gotten                           100,000     100,000           0&lt;br /&gt;STAT..table fetch by rowid                             100,000     100,002           2&lt;br /&gt;STAT..consistent gets from cache (fastpath)             11,059      26,587      15,528&lt;br /&gt;STAT..no work - consistent read gets                    11,049      26,577      15,528&lt;br /&gt;LATCH.cache buffers chains                             133,384      60,829     -72,555&lt;br /&gt;STAT..consistent gets                                  122,324      33,149     -89,175&lt;br /&gt;STAT..consistent gets from cache                       122,324      33,149     -89,175&lt;br /&gt;STAT..session logical reads                            122,324      33,149     -89,175&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..rows fetched via callback                        100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;STAT..consistent gets - examination                    111,265       5,470    -105,795&lt;br /&gt;STAT..buffer is not pinned count                       200,014      10,028    -189,986&lt;br /&gt;STAT..buffer is pinned count                             5,107     195,440     190,333&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;We can tell now different things from these statistics:&lt;br /&gt;&lt;br /&gt;- The "non-unique index" variant requires just 60,000 latch acquisitions - which corresponds to the reduced number of logical I/Os&lt;br /&gt;&lt;br /&gt;- The session statistics only "explain" 195,000 buffer visits via already pinned and 33,000 buffer visits recorded as logical I/Os, so we are missing approx. 270,000 buffer visits from the statistics. Compared to the results of the "unordered" test case we actually see a "reduction" of buffers visited that are already pinned (199,993 vs. 195,440), so that seems to be questionable&lt;br /&gt;&lt;br /&gt;- The "unique index" variant still does the "short-cut" logical I/O on the inner table random lookup and hence requires actually more logical I/O and latch acquisitions in this case than the "non-unique index" variant&lt;br /&gt;&lt;br /&gt;As we've seen above if in 11.1.0.7 and 11.2.0.1 a filter is put on the inner table random lookup Oracle 11g switches to "normal" consistent gets for the "unique index" variant, and in fact when repeating this experiment with the ordered data set, we see these results:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:02.97 |   32315 |&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:02.97 |   32315 |&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |   1000 |   202K  (1)|    100K|00:00:02.76 |   32315 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T2     |      1 |    100K|  2720   (1)|    100K|00:00:00.22 |   10010 |&lt;br /&gt;|*  4 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:01.81 |   22305 |&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.56 |   12302 |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So by switching to the "normal" consistent gets the buffer pinning optimization gets used for the inner table lookup also for the "unique index" variant (only reproducible in 11.1.0.7 and 11.2.0.1). The session statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..table scan blocks gotten                          10,000      10,000           0&lt;br /&gt;STAT..table scan rows gotten                           100,000     100,000           0&lt;br /&gt;STAT..table fetch by rowid                             100,000     100,002           2&lt;br /&gt;STAT..buffer is not pinned count                        10,018      10,028          10&lt;br /&gt;STAT..buffer is pinned count                           195,103     195,440         337&lt;br /&gt;STAT..consistent gets                                   32,315      33,149         834&lt;br /&gt;STAT..consistent gets from cache                        32,315      33,149         834&lt;br /&gt;STAT..session logical reads                             32,315      33,149         834&lt;br /&gt;STAT..consistent gets from cache (fastpath)             21,050      26,587       5,537&lt;br /&gt;STAT..no work - consistent read gets                    21,040      26,577       5,537&lt;br /&gt;STAT..consistent gets - examination                     11,265       5,470      -5,795&lt;br /&gt;LATCH.cache buffers chains                              53,366      60,835       7,469&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So we see now the "unique index" variant with similar results and the also a similar "gap" in the buffer visits explained by the statistics.&lt;br /&gt;&lt;br /&gt;A slightly funny point is that by adding a "useless" filter we seem to arrive actually at a faster execution time due to the optimization kicking in - something that looks quite counter-intuitive and only seems to work in particular versions.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;"Fastpath" consistent gets&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To see if this optimization depends on the new "fastpath" consistent gets, let's turn this new feature off by setting "_fastpin_enable" to 0 and restarting the instance:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;alter system set "_fastpin_enable" = 0 scope = spfile;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;I'm showing here the results for the "inner table filter" variation - but those for the original case without the additional filter are also corresponding to those with "fast pinning" enabled:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;11.2.0.1 Classic Nested Loop - data ordered by ID, fast pins disabled, inner table filter&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Unique Index - T1 and T2 ordered by ID&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:02.90 |   32315 |&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:02.90 |   32315 |&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |   1000 |   202K  (1)|    100K|00:00:02.70 |   32315 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T2     |      1 |    100K|  2720   (1)|    100K|00:00:00.21 |   10010 |&lt;br /&gt;|*  4 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:01.79 |   22305 |&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.53 |   12302 |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Non-Unique Index - T1 and T2 ordered by ID&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:03.86 |   33143 |&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:03.86 |   33143 |&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |   1000 |   202K  (1)|    100K|00:00:03.67 |   33143 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T1     |      1 |    100K|  2720   (1)|    100K|00:00:00.21 |   10010 |&lt;br /&gt;|*  4 |    TABLE ACCESS BY INDEX ROWID| T2     |    100K|      1 |     2   (0)|    100K|00:00:02.74 |   23133 |&lt;br /&gt;|*  5 |     INDEX RANGE SCAN          | T2_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.71 |   13126 |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So the same optimization kicked in, and we can tell from the session statistics that the "fastpath" consistent gets indeed have not been used:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..table scan blocks gotten                          10,000      10,000           0&lt;br /&gt;STAT..table scan rows gotten                           100,000     100,000           0&lt;br /&gt;STAT..table fetch by rowid                             100,000     100,002           2&lt;br /&gt;STAT..buffer is not pinned count                        10,018      10,028          10&lt;br /&gt;STAT..buffer is pinned count                           195,103     195,440         337&lt;br /&gt;STAT..consistent gets                                   32,315      33,149         834&lt;br /&gt;STAT..consistent gets from cache                        32,315      33,149         834&lt;br /&gt;STAT..session logical reads                             32,315      33,149         834&lt;br /&gt;STAT..no work - consistent read gets                    21,040      26,577       5,537&lt;br /&gt;STAT..consistent gets - examination                     11,265       5,470      -5,795&lt;br /&gt;LATCH.cache buffers chains                              53,372      60,829       7,457&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The only significant difference is the absence of the "consistent gets from cache (fastpath)" statistics.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Nested Loop Join Batching&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Finally let's check if the new "Nested Loop Batching" optimization does have any additional effects on the test case here by enabling the Nested Loop Join Batching. Changing the hints like this does the job:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 1) no_nlj_prefetch(b) */&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;11.2.0.1 Nested Loop Batching - data ordered by ID, inner table filter&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Unique Index - T1 and T2 ordered by ID&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:02.89 |   32306 |&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:02.89 |   32306 |&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |        |            |    100K|00:00:02.70 |   32306 |&lt;br /&gt;|   3 |    NESTED LOOPS               |        |      1 |   1000 |   202K  (1)|    100K|00:00:01.43 |   22306 |&lt;br /&gt;|   4 |     TABLE ACCESS FULL         | T2     |      1 |    100K|  2720   (1)|    100K|00:00:00.20 |   10010 |&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.53 |   12296 |&lt;br /&gt;|*  6 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:00.57 |   10000 |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Inner row source Non-Unique Index - T1 and T2 ordered by ID&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT              |        |      1 |        |   202K(100)|      1 |00:00:03.05 |   33128 |&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:03.05 |   33128 |&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |        |            |    100K|00:00:02.85 |   33128 |&lt;br /&gt;|   3 |    NESTED LOOPS               |        |      1 |   1000 |   202K  (1)|    100K|00:00:01.57 |   23128 |&lt;br /&gt;|   4 |     TABLE ACCESS FULL         | T1     |      1 |    100K|  2720   (1)|    100K|00:00:00.20 |   10010 |&lt;br /&gt;|*  5 |     INDEX RANGE SCAN          | T2_IDX |    100K|      1 |     1   (0)|    100K|00:00:00.67 |   13118 |&lt;br /&gt;|*  6 |    TABLE ACCESS BY INDEX ROWID| T2     |    100K|      1 |     2   (0)|    100K|00:00:00.57 |   10000 |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Apart from some minor differences in the number of logical I/Os it doesn't change the outcome. The same applies to the session statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..table scan blocks gotten                          10,000      10,000           0&lt;br /&gt;STAT..table scan rows gotten                           100,000     100,000           0&lt;br /&gt;STAT..table fetch by rowid                             100,000     100,002           2&lt;br /&gt;STAT..buffer is not pinned count                        10,006      10,010           4&lt;br /&gt;STAT..buffer is pinned count                           195,115     195,458         343&lt;br /&gt;STAT..consistent gets                                   32,306      33,134         828&lt;br /&gt;STAT..consistent gets from cache                        32,306      33,134         828&lt;br /&gt;STAT..session logical reads                             32,306      33,134         828&lt;br /&gt;STAT..consistent gets from cache (fastpath)             21,041      26,572       5,531&lt;br /&gt;STAT..no work - consistent read gets                    21,031      26,562       5,531&lt;br /&gt;STAT..consistent gets - examination                     11,265       5,470      -5,795&lt;br /&gt;LATCH.cache buffers chains                              53,348      60,816       7,468&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;What is interesting to see however is that it seems to perform faster, in particular the "non-unique index" variant is now really pretty close to the "unique index" variant - so although the Nested Loop Join Batching doesn't show any significant changes in the statistics and latch acquisition, it seems to save CPU cycles and performs better even without any physical I/O involved.&lt;br /&gt;&lt;br /&gt;As a side note, if you want to check the effects of the "Nested Loop Join Batching" on physical I/O you need to be aware of an odd behaviour I've experienced during my tests: If any kind of row source statistics sampling was enabled by either using STATISTICS_LEVEL = ALL, the GATHER_PLAN_STATISTICS hint or even enabling (extended) SQL trace, the optimized, batched form of physical I/O could not be reproduced. You could tell this from the session statistics that start with "Batched IO%" - these all stayed at 0. Only when disabling all these things the effects were visible and the corresponding statistics where non-zero. I don't know why this is the case, but it is an important detail when testing this feature. I'll probably publish a separate post on the physical I/O optimizations of the Vector/Batched I/O at some time in the future.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Scalability&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When running the "data ordered by ID" version concurrently it can be seen that the "non-unique index" variant scales now almost equally well as the "unique index" variant - so these two variants are now quite close not only in single-user mode, but they both scale very well, too.&lt;br /&gt;&lt;br /&gt;There is another interesting effect that can only be observed when running the test case with the unordered data set concurrently: In recent code releases (10.2.0.5, 11.2.0.1 and 11.2.0.2) the "shortcut" consistent gets on the inner table lookup that are used with the "unique index" variant gets "downgraded" to "normal" consistent gets if there is concurrent access to the block. This can be observed in the session statistics and latch acquisitions:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;Statistics Name                                                          Value&lt;br /&gt;-----------------------------------------------------              -----------&lt;br /&gt;STAT..shared hash latch upgrades - no wait                              99,995&lt;br /&gt;STAT..RowCR attempts                                                   100,000&lt;br /&gt;STAT..RowCR hits                                                       100,000&lt;br /&gt;STAT..consistent gets from cache (fastpath)                             10,011&lt;br /&gt;STAT..no work - consistent read gets                                    10,000&lt;br /&gt;STAT..consistent gets - examination                                    200,013&lt;br /&gt;STAT..consistent gets                                                  310,024&lt;br /&gt;STAT..consistent gets from cache                                       310,024&lt;br /&gt;LATCH.cache buffers chains                                           1,680,173&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Note in particular how the "consistent gets - examination" statistics have been decreased from 300,000 to 200,000. So with four concurrent executions this "unique index" variant suddenly requires approx. 420,000 latch acquisitions per execution in contrast to the usual 320,000. Since 11.2.0.2 does not support the "filter" trick to make use of the buffer pinning optimization for the inner table lookup with the ordered data set and the "unique index" variant, it suffers twice: Not only it requires single latch acquisitions for the inner table lookup but due to the "downgrade" it performs two latch acquisitions per iteration requiring a whopping 200,000 excess latch acquisitions per concurrent execution with the ordered by ID data set.&lt;br /&gt;&lt;br /&gt;It's also interesting to note that the "RowCR" optimization is recorded in the session statistics. I couldn't find much information about this - it seems to be in the code since 10.2 (partial support already in 9.2 RAC), but until 10.2.0.5 it is only enabled in RAC mode and not in single-instance mode (see MOS note "Bug 4951888 - Row CR is not enabled for non RAC systems"). I could reproduce this only in 10.2.0.5, 11.2.0.1 and 11.2.0.2. According to the description it has been specifically introduced for using row-level consistent gets instead of rolling back complete block versions for read-consistency in RAC environments where generating the previous version of a block might require undo blocks from remote instances. Why this optimization shows up in the above single-instance, read-only scenario where no rollback to the block version is required is not clear to me. It is however measurable that the "fallback" seems to slow down execution. &lt;br /&gt;&lt;br /&gt;Whether this is a side-effect or a deliberate design choice that performs better in RAC environments or certain consistent read scenarios I can't tell yet, however when switching off this optimization via "alter system set "_row_cr" = false" this "downgrade" with concurrent execution doesn't happen any longer, and 11.2.0.2 performs better in my test cases, although it doesn't bring back the "filter" trick, so 11.2.0.2 is the only release where the "non-unique index" variant scales better with the ordered data set than the "unique index" variant.&lt;br /&gt;&lt;br /&gt;A final word on scalability in general: I think it is important to point out that the test harness provided so far only checks for concurrent read access. Since it is interesting to see if the "buffer pinning" optimization observed does have any negative side effects on mixed read/write access to the buffers I've published an &lt;a href="http://sqltools-plusplus.org:7676/media/concurrent_unique_non_unique_execution_updated.zip"&gt;updated script set&lt;/a&gt; that includes new versions of the concurrent execution master and slave scripts. These allow to run a SELECT FOR UPDATE on both tables involved as first session, and all other sessions in read-only mode in order to test the effects of a mixed read/write concurrency scenario.&lt;br /&gt;&lt;br /&gt;The result of this quite simple test shows that the buffer pinning optimization not only scales very well for read-only concurrency but also scales very good for the tested mixed read-write scenario. The provided test case might be a specific and simplistic case (there are some specialities with SELECT FOR UPDATE) and there might be other concurrency scenarios where the buffer pinning might not scale that well (for example potentially "free buffer waits" due to many blocks being pinned) but at least with this test case the result is quite impressive.&lt;br /&gt;&lt;br /&gt;As a side note, the mixed read-write test is very interesting on its own in several ways, for example:&lt;br /&gt;&lt;br /&gt;- It adds additional pressure on the buffer cache due to clone copies created. A query similar to the one provided by Jonathan Lewis &lt;a href="http://jonathanlewis.wordpress.com/2011/03/14/buffer-states/"&gt;here&lt;/a&gt; can be quite revealing. You'll find out that you need a much larger cache to still have a fully cached test case (with 8KB block size at least 512MB for keeping two 80MB segments! fully cached)&lt;br /&gt;&lt;br /&gt;- It requires additional buffer cache for the undo blocks&lt;br /&gt;&lt;br /&gt;- It will generate a much higher contention on the "cache buffers chains" latches due to the additional buffer cache activity (creating clone copies, rollbacks for consistent reads, current mode gets etc.)&lt;br /&gt;&lt;br /&gt;- It requires applying undo to the blocks to arrive at a read-consistent version&lt;br /&gt;&lt;br /&gt;- The buffers will have to be accessed in exclusive mode for write access&lt;br /&gt;&lt;br /&gt;The updated script set also contains an Excel sheet with results from my test runs on different hardware and Oracle versions as well as a sample query to analyse the buffer cache.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Oracle 11g extends the logical I/O optimizations that could already been seen in Oracle 10g when using the Table Prefetching Nested Loop shape - and it is available without any further optimizations like Table Prefetching or Nested Loop Join Batching. It is also not depending on the new "fastpath" consistent gets introduced with 11g.&lt;br /&gt;&lt;br /&gt;The efficiency of the optimization largely depends on the order of the data, so predicting it is not that easy - a bit similar to predicting the efficiency of the Subquery / Filter caching feature that also depends on data patterns.&lt;br /&gt;&lt;br /&gt;However this knowledge might offer additional options how to take advantage of this optimization. Of course introducing additional sort operations might easily outweigh the benefits achieved, but there might be cases where a sort is not that costly and allows to improve scalability/concurrency in extreme cases.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Closing remarks&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This blog post got way too long&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-278693880263522102?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/278693880263522102/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=278693880263522102' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/278693880263522102'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/278693880263522102'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/08/logical-io-evolution-part-3-11g.html' title='Logical I/O Evolution - Part 3: 11g'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-3725656238422541281</id><published>2011-08-08T10:00:00.022+02:00</published><updated>2011-08-10T20:51:23.616+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='extended statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='Expressions'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='CBO'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='11.1.0.7'/><category scheme='http://www.blogger.com/atom/ns#' term='Multi-Column Join'/><category scheme='http://www.blogger.com/atom/ns#' term='upgrade'/><category scheme='http://www.blogger.com/atom/ns#' term='function-based index'/><category scheme='http://www.blogger.com/atom/ns#' term='virtual columns'/><category scheme='http://www.blogger.com/atom/ns#' term='cardinality'/><title type='text'>Multi-Column Joins, Expressions and 11g</title><content type='html'>&lt;span style="font-weight:bold;"&gt;Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I've already outlined in one of my &lt;a href="http://oracle-randolf.blogspot.com/2009/10/multi-column-joins.html"&gt;previous posts&lt;/a&gt; that getting a reasonable cardinality estimate for multi-column joins can be tricky, in particular when dealing with correlated column values in the join columns.&lt;br /&gt;&lt;br /&gt;Since Oracle 10g several "Multi-Column Join Cardinality" sanity checks have been introduced that prevent a multi-column join from producing too low join cardinalities - this is controlled via the "_optimizer_join_sel_sanity_check" internal parameter that defaults to true from 10g on.&lt;br /&gt;&lt;br /&gt;It looks like that if you upgrade to 11g this version adds just another twist to this issue. If you happen to have expressions as part of your join predicates then in 10g these are still covered by the multi-column join cardinality sanity checks as long as at least one side of the join refers to simple columns, but this seems no longer to be the case from 11g on.&lt;br /&gt;&lt;br /&gt;Note that if those expressions are already covered by corresponding function-based indexes in pre-11g then this problem will not show up as described here - in fact, adding corresponding indexes is one of the possible fixes as I'll outline below.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;A working example&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's have a look at a simple example to demonstrate the potential upgrade issue. This code snippet creates a table with 1000 rows - ID_50 and ID_CHAR_50 both hold 50 distinct values and the two columns are correlated.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;create table t&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;       rownum as id&lt;br /&gt;     , mod(rownum, 50) + 1 as id_50&lt;br /&gt;     , 'ABC' || to_char(mod(rownum, 50) + 1) as id_char_50&lt;br /&gt;     , case when mod(rownum, 2) = 0 then null else mod(rownum, 100) + 1 end as id_50_null&lt;br /&gt;     , case when mod(rownum, 2) = 0 then null else 'ABC' || to_char(mod(rownum, 100) + 1) end as id_char_50_null&lt;br /&gt;from&lt;br /&gt;     dual&lt;br /&gt;connect by&lt;br /&gt;     level &amp;lt;= 1000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't', method_opt =&amp;gt; 'for all columns size 1')&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;In 10.2.0.4 if you check the cardinality estimates of the following query you'll see the "Multi-Column Join Cardinality" check kicking in:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;explain plan for&lt;br /&gt;select  /*+ optimizer_features_enable('10.2.0.4') */&lt;br /&gt;       /* opt_param('_optimizer_join_sel_sanity_check', 'false') */&lt;br /&gt;       count(*)&lt;br /&gt;from&lt;br /&gt;       t t1&lt;br /&gt;     , t t2&lt;br /&gt;where&lt;br /&gt;       t1.id_50 = t2.id_50&lt;br /&gt;and     t1.id_char_50 = t2.id_char_50&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      |  1000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;If you activate the commented hint to disable the sanity check, you'll end up with a different estimate that corresponds simply to the selectivity of each single join predicate multiplied: 1/50 * 1/50 * 1000 * 1000 = 400.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      |   400 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;If now one or more expressions get introduced on one side of the join, in 10.2.0.4 the result will still correspond to the one with the sanity check enabled:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;explain plan for&lt;br /&gt;select  /*+ optimizer_features_enable('10.2.0.4') */&lt;br /&gt;       /* opt_param('_optimizer_join_sel_sanity_check', 'false') */&lt;br /&gt;       count(*)&lt;br /&gt;from&lt;br /&gt;       t t1&lt;br /&gt;     , t t2&lt;br /&gt;where&lt;br /&gt;       t1.id_50 = case when t2.id_50 is null then -1 else t2.id_50 end&lt;br /&gt;and     t1.id_char_50 = t2.id_char_50&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      |  1000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;But if the same is repeated in 11.1 or 11.2, you'll end up with this result - as you can see the sanity checks have not been used and we get the same result as in 10.2.0.4 with disabled sanity checks - by the way, depending on the expressions (and on which sides of the join they get used), you might even end up with a different cardinality estimate based on default selectivities like 1/100 - this is controlled via the "_use_column_stats_for_function" parameter that defaults to true in recent releases and therefore some simpler join expressions still use the underlying column statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      |   400 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This change in behaviour can lead to dramatic changes in the cardinality estimates and hence to different execution plans - potentially performing much worse than before. The change in this example here is not that significant but it can easily lead to very low cardinality estimates if the join columns do have a high number of distinct values.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;New 11g Features&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I believe the issue has been introduced by the new Oracle 11g feature of virtual columns and extended statistics on expressions and column groups. In fact these new features provide a possible workaround for the issue: By creating a corresponding virtual column or extended statistics on the expressions used as part of the join the sanity check can be re-enabled in 11g.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;exec dbms_stats.gather_table_stats(null, 't', method_opt =&amp;gt;'for columns (case when id_50 is null then -1 else id_50 end) size 1')&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;or alternatively:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;alter table t add (virtual_col1 as (case when t2.id_50 is null then -1 else t2.id_50 end));&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't', method_opt =&amp;gt;'for columns virtual_col1 size 1')&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;As already outlined above, another possible workaround is adding a corresponding function-based index:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;create index t_idx_func1 on t (case when id_50 is null then -1 else id_50 end);&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Since adding a function-based index adds a similar hidden virtual column to the table as the extended statistics does the net effect will be the same but of course with the additional overhead of maintaining the index.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Column Groups with Expressions - Correlated Column Values&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Of course if we really would like to take advantage of the new features with correlated column values what we should try to do is creating a column group on the combined expressions to allow the optimizer to detect the correlation, but unfortunately mixing expressions/virtual columns with column groups is explicitly mentioned in the documentation as not supported (yet), which can be confirmed:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;SQL&amp;gt; exec dbms_stats.gather_table_stats(null, 't', method_opt =&amp;gt;'for columns (case when id_50 is null then -1 else id_50 end, id_char_50) size 1')&lt;br /&gt;BEGIN dbms_stats.gather_table_stats(null, 't', method_opt =&amp;gt;'for columns (case when id_50 is null then -1 else id_50 end, id_char_50) size 1'); END;&lt;br /&gt;&lt;br /&gt;*&lt;br /&gt;ERROR at line 1:&lt;br /&gt;ORA-20001: Error when processing extension -  missing right parenthesis&lt;br /&gt;&lt;br /&gt;SQL&amp;gt; exec dbms_stats.gather_table_stats(null, 't', method_opt =&amp;gt;'for columns (virtual_col1, id_char_50) size 1')&lt;br /&gt;BEGIN dbms_stats.gather_table_stats(null, 't', method_opt =&amp;gt;'for columns (virtual_col1, id_char_50) size 1'); END;&lt;br /&gt;&lt;br /&gt;*&lt;br /&gt;ERROR at line 1:&lt;br /&gt;ORA-20001: Error when processing extension -  virtual column is referenced in a&lt;br /&gt;column expression&lt;br /&gt;ORA-06512: at "SYS.DBMS_STATS", line 20337&lt;br /&gt;ORA-06512: at "SYS.DBMS_STATS", line 20360&lt;br /&gt;ORA-06512: at line 1&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Yet the strange thing is that the desired effect can easily be achieved by adding a corresponding multi-column function-based index like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;create index t_idx3 on t (case when id_50 is null then -1 else id_50 end, id_char_50);&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So this is one area where virtual columns / extended statistics are not yet equivalent to function-based indexes. However there is one significant difference between the index and the extended statistics column group approach: Whereas the former can be used to derive the number of distinct values if the index is an exact match to the column group the latter creates a virtual column combining the columns into a single expression using the undocumented SYS_OP_COMBINED_HASH function. Histograms can be generated on that virtual column which can be helpful in the case of correlated &lt;span style="font-weight:bold;"&gt;and &lt;/span&gt;skewed column values. Note that in my tests the join cardinality calculation based on column groups did not take any existing histograms on the virtual column into account, however single-table access predicates could make use of the histogram - but it's something that might be possible in future releases, but it's not possible to derive that information from the index on the column group.&lt;br /&gt;&lt;br /&gt;Repeat above EXPLAIN PLAN now, first with 10.2.0.4 optimizer settings:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;explain plan for&lt;br /&gt;select  /*+ optimizer_features_enable('10.2.0.4') */&lt;br /&gt;       count(*)&lt;br /&gt;from&lt;br /&gt;       t t1&lt;br /&gt;     , t t2&lt;br /&gt;where&lt;br /&gt;       t1.id_50 = case when t2.id_50 is null then -1 else t2.id_50 end&lt;br /&gt;and     t1.id_char_50 = t2.id_char_50&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      |  1000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;No change, however, if you repeat the same now with 11.1 or 11.2 optimizer settings:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      | 20000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;In this particular case the 11g cardinality estimate with the multi-column function-based index is spot on.&lt;br /&gt;&lt;br /&gt;As already explained in the previous post Oracle 11g does now take advantage of indexes that 10g didn't - in 10g this required unique indexes.&lt;br /&gt;&lt;br /&gt;Although this is good news, and the cardinality estimates in general should change for the better, it still means that even with suitable indexes in place you might end up with significant cardinality estimate changes after the upgrade to 11g that require testing.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;The Single-Column Workarounds&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;With a single, non-combined statistics expression (using one of the methods shown above) in 11g we are at least back to the 10.2.0.4 cardinality estimate with the sanity checks enabled:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      |  1000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Interestingly if the column group is covered by an index then in 11g the sanity check is also still enabled - and the order and position in the index apparently doesn't matter in this particular case, it just has to be an index covering the columns/expressions used - possibly among other columns/expressions.&lt;br /&gt;&lt;br /&gt;Of course this workaround can have other side effects: First of all you introduce more work because DBMS_STATS needs to gather statistics for the underlying virtual columns added - and if want to use extended statistics on expressions rather than virtual columns you can only have a limited number of statistics extensions per table (I'm not sure why this restriction exists and it can be worked around by using virtual columns instead). Also the additional virtual columns count towards the hard limit of 1,000 columns per table.&lt;br /&gt;&lt;br /&gt;Furthermore if you happen to use the same expressions as filter predicates the cardinality estimates very likely will again change with the workaround in place - mind you, it will probably lead to improved cardinality estimates, but nevertheless it means a change that needs to be tested.&lt;br /&gt;&lt;br /&gt;Here is a cardinality estimate for the sample join expression used as filter without the workaround:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;explain plan for&lt;br /&gt;select count(*) from t where case when id_50 is null then -1 else id_50 end = :b1;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));&lt;br /&gt;&lt;br /&gt;-------------------------------------------&lt;br /&gt;| Id  | Operation          | Name | Rows  |&lt;br /&gt;-------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT   |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE    |      |     1 |&lt;br /&gt;|   2 |   TABLE ACCESS FULL| T    |    10 |&lt;br /&gt;-------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;But with the workaround in place:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;explain plan for&lt;br /&gt;select count(*) from t where case when id_50 is null then -1 else id_50 end = :b1;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));&lt;br /&gt;&lt;br /&gt;-------------------------------------------&lt;br /&gt;| Id  | Operation          | Name | Rows  |&lt;br /&gt;-------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT   |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE    |      |     1 |&lt;br /&gt;|   2 |   TABLE ACCESS FULL| T    |    20 |&lt;br /&gt;-------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;More Complex Expressions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The new features allow 11g to use the sanity checks (or cardinality estimates derived from index statistics) even in cases where 10g would not be able to use them - if you for example happen to have expressions on both sides of the join, in 10g the sanity checks get disabled, but using the corresponding expressions in 11g allows to have the sanity checks enabled.&lt;br /&gt;&lt;br /&gt;For more complex expressions - that are probably a clear indication of a design issue - the 11g extended statistics/virtual columns also allow get improved cardinality estimates in general - 10g would resort to some hard-coded selectivity like 1/100 for equi-joins - 11g would be able to cover that as well:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;explain plan for&lt;br /&gt;select  /* optimizer_features_enable('10.2.0.4') */&lt;br /&gt;       /* opt_param('_optimizer_join_sel_sanity_check', 'false') */&lt;br /&gt;       count(*)&lt;br /&gt;from&lt;br /&gt;       t t1&lt;br /&gt;     , t t2&lt;br /&gt;where&lt;br /&gt;       nvl(t1.id_50 + t1.id_50_null, -1) = nvl(t2.id_50 + t2.id_50_null, -1)&lt;br /&gt;and     nvl(t1.id_char_50 || t1.id_char_50_null, 'bla') = nvl(t2.id_char_50 || t2.id_char_50_null, 'bla')&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'BASIC +ROWS'));&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      |   100 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Expressions like that will disable the sanity check and also not use any underlying column statistics and therefore fall back to built-in, hard-coded defaults.&lt;br /&gt;&lt;br /&gt;But when creating corresponding extended statistics / virtual columns / multi-column function-based indexes in 11g we are able to take advantage of the sanity checks (at least) and get improved cardinality estimates in general:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;exec dbms_stats.gather_table_stats(null, 't', method_opt =&amp;gt;'for columns (nvl(id_50 + id_50_null, -1)) size 1')&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't', method_opt =&amp;gt;'for columns (nvl(id_char_50 || id_char_50_null, ''bla'')) size 1')&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;| Id  | Operation           | Name | Rows  |&lt;br /&gt;--------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT    |      |     1 |&lt;br /&gt;|   1 |  SORT AGGREGATE     |      |     1 |&lt;br /&gt;|   2 |   HASH JOIN         |      |  1000 |&lt;br /&gt;|   3 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;|   4 |    TABLE ACCESS FULL| T    |  1000 |&lt;br /&gt;--------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Expressions used as part of multi-column join predicates can cause a lot of trouble when upgrading from 10g to 11g - fortunately there are viable workarounds available if you want to keep the optimizer features of 11g enabled - in other words restricting to 10g optimizer features is of course also a workaround but usually not a desired long-term solution.&lt;br /&gt;&lt;br /&gt;Note that there are cases where multi-column function-based indexes offer improved cardinality estimates in 11g over that of virtual columns or extended statistics - but at the price of maintaining an additional (potentially wide) index, requiring additional storage and at the risk of other plans changing by either using the index or being indirectly influenced by the additional index statistics available.&lt;br /&gt;&lt;br /&gt;Some of the side-effects of the additional index could be addressed by leaving such an index in unusable state, but this again might have other, undesirable side effects, like statistics gathering jobs failing with error messages about unusable indexes etc.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-3725656238422541281?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/3725656238422541281/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=3725656238422541281' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/3725656238422541281'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/3725656238422541281'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/08/multi-column-joins-expressions-and-11g.html' title='Multi-Column Joins, Expressions and 11g'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-7565606627672031079</id><published>2011-08-01T10:33:00.001+02:00</published><updated>2011-08-01T10:33:00.513+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='extended statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='HCC'/><category scheme='http://www.blogger.com/atom/ns#' term='virtual columns'/><title type='text'>HCC And Virtual Columns</title><content type='html'>This is just a short heads-up note to those dealing with HCC-enabled tables (so at present this applies only to Exadata customers).&lt;br /&gt;&lt;br /&gt;As already outlined in a &lt;a href="http://oracle-randolf.blogspot.com/2010/07/compression-restrictions.html"&gt;previous post&lt;/a&gt; about compression restrictions tables with HCC enabled do not support dropping columns - DROP COLUMN gets silently converted into SET UNUSED and DROP UNUSED COLUMNS throws an error to be unsupported.&lt;br /&gt;&lt;br /&gt;I've recently come across an interesting variation of this restriction. Obviously Oracle treats virtual columns in this case the same: If you drop a virtual column of a HCC-enabled table it doesn't get dropped but is also silently turned into an unused column - which doesn't really make sense to me since dropping it doesn't require any physical modification to the underlying data structures.&lt;br /&gt;&lt;br /&gt;Now you might wonder why this could be relevant? Well it can be important for several reasons:&lt;br /&gt;&lt;br /&gt;1. All the unused columns whether virtual or not count towards the 1,000 column limit of a table - so frequently adding and dropping virtual columns is a no-brainer with non-HCC tables, but can become relevant with HCC enabled&lt;br /&gt;&lt;br /&gt;2. Extended Statistics also use under the covers virtual columns. So if you create and drop extended statistics the same happens - the dropped virtual columns stay there. What is even more annoying - there is an upper limit of number of extensions per table. The limit itself is defined in a quite interesting way (greatest(20, 10% non-virtual columns)), but the problem here is that the dropped extensions count towards this limit, so you can easily end up with a situation where you cannot add any further extended statistics but you can't see any of them in the DBA/ALL/USER_STAT_EXTENSIONS dictionary view. What you can see however in DBA/ALL/USER_TAB_COLS are the remaining dropped virtual hidden columns&lt;br /&gt;&lt;br /&gt;Since you can't drop unused columns on HCC-enabled tables there is no easy way around this apart from uncompressing the table/all table partitions, dropping the unused columns and re-compressing - nothing you usually want to/can do with HCC-compressed segments...&lt;br /&gt;&lt;br /&gt;Note by the way that this nuisance doesn't affect exchange partition operations. Virtual columns are correctly handled in case of exchange partition operations - which means that only the physical column definitions need to be in sync between the two segments exchanged, but not any virtual columns. You can happily exchange partitions between tables with different number and types of virtual columns.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-7565606627672031079?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/7565606627672031079/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=7565606627672031079' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/7565606627672031079'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/7565606627672031079'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/08/hcc-and-virtual-columns.html' title='HCC And Virtual Columns'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-3530263354865713210</id><published>2011-07-27T10:07:00.004+02:00</published><updated>2011-09-14T22:28:55.348+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='System Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Auto-DOP'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='I/O Resource Calibration'/><category scheme='http://www.blogger.com/atom/ns#' term='CBO'/><category scheme='http://www.blogger.com/atom/ns#' term='CPU Costing'/><title type='text'>Cost Is Time: Next Generation</title><content type='html'>It looks like Oracle has introduced with the Oracle 11.2.0.2 patch set a new "cost is time" model for the time estimate of the Cost-Based Optimizer (CBO). &lt;br /&gt;&lt;br /&gt;In order to understand the implications let me summarize the evolution of the CBO in terms of cost / time estimate so far:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;1. Oracle 7 and 8&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The cost estimate generated by the Cost-Based Optimizer (CBO) &lt;a href="http://jonathanlewis.wordpress.com/2006/12/11/cost-is-time/"&gt;has always been&lt;/a&gt; &lt;a href="http://jonathanlewis.wordpress.com/2011/01/10/cost-again/"&gt;a time estimate&lt;/a&gt;, although expressed in a slightly obscure unit, which is number of single block reads.&lt;br /&gt;&lt;br /&gt;The traditional I/O based costing introduced with Oracle 7 in principle counted the number of required single and multi-block reads to arrive at the final cost. A potential drawback of this approach was the missing differentiation of multi- and single-block reads - one multi-block read created the same cost as a one single-block read. Although the model used an "adjusted" multi-block read count to make full table scans more costlier than indicated by larger "db_file_multiblock_read_count" settings (and hence accounted for smaller extents and blocks already cached in the buffer cache making multi-block reads smaller than requested) it still potentially favoured full table scans over index access paths.&lt;br /&gt;&lt;br /&gt;From Oracle 8 on one could play with the OPTIMIZER_INDEX_COST_ADJ / OPTIMIZER_INDEX_CACHING parameter to adjust this shortcoming of the costing model in particular for OLTP biased applications.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;2. Oracle 9i&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Oracle introduced with Oracle 9i the System Statistics along with a more sophisticated cost calculation model.&lt;br /&gt;&lt;br /&gt;In short, System Statistics offer the following features:&lt;br /&gt;&lt;br /&gt;- Different treatment of single-block and multi-block operations&lt;br /&gt;- Time-based optimization using average timings for single- and multi-block reads&lt;br /&gt;- Cost calculation includes a CPU cost component&lt;br /&gt;- Gather actual hardware capabilities to base the calculations on actual system capabilities and workload pattern&lt;br /&gt;&lt;br /&gt;More details can be found for example in my &lt;a href="http://oracle-randolf.blogspot.com/2009/04/understanding-different-modes-of-system.html"&gt;"Understanding System Statistics"&lt;/a&gt; blog series.&lt;br /&gt;&lt;br /&gt;So with System Statistics the CBO actually calculates an estimated execution time - you can see this in the EXPLAIN PLAN output: With System Statistics enabled it includes a TIME column.&lt;br /&gt;&lt;br /&gt;Simplified spoken the time estimate is simply the average time for a single block read (SREADTIM) times the number of single block reads plus the average time for a multi-block read (MREADTIM) times the number of multi-block reads plus the estimated number of cpu operations divided by the cpu operations per second (CPUSPEED / CPUSPEEDNW). So the cost with System Statistics is actually based on a time estimation. &lt;br /&gt;&lt;br /&gt;For consistency reasons it has been decided to use the same unit as before, so the estimated time is simply divided by the SREADTIM to arrive at the same cost unit as with traditional I/O based costing which is number of single-block reads (although plans involving full segment scan operations usually arrive at different costs than the traditional costing, so consistency is hardly given anyway).&lt;br /&gt;&lt;br /&gt;Right from the beginning in Oracle 9i the System Statistics could be gathered in WORKLOAD mode, which means that Oracle takes two snapshots of certain performance statistics and calculates the System Statistics parameters like SREADTIM, MREADTIM, MBRC etc. from the delta values.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;3. Oracle 10g&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Starting with Oracle 10g System Statistics were enabled by default with so called default NOWORKLOAD settings. It even allowed to generate an artificial load on the system by gathering the NOWORKLOAD System Statistics simply using a predefined I/O pattern to gather the disk transfer speed (IOTFRSPEED) and disk seek time (IOSEEKTIM) - these values are then used to derive the SREADTIM and MREADTIM values - the two most important ingredients of the enhanced cost/time calculation.&lt;br /&gt;&lt;br /&gt;So since Oracle 9i there is a built-in functionality to measure the capabilities of the underlying hardware - from 10g on either based on a particular workload pattern or by submitting an artificial predefined load.&lt;br /&gt;&lt;br /&gt;Furthermore Oracle provides a well-defined API as part of the DBMS_STATS package for dealing with System Statistics: They can be gathered, deleted, exported, imported, manually defined and even gathered directly into a separate statistics table to build a history of System Statistics gathered.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;4. Oracle 11g&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In 11g Oracle introduced the I/O calibration routine as part of the Resource Manager. Note that so far this feature didn't have a direct relationship to the Cost Based Optimizer - it could be used however to control the maximum parallel degree using the PARALLEL_IO_CAP_ENABLED parameter.&lt;br /&gt;&lt;br /&gt;The first thing that puzzled me when dealing with that new functionality was the lack of a well-defined API to maintain the gathered information. There is a single call in the resource manager package (DBMS_RESOURCE_MANAGER.CALIBRATE_IO) to run the I/O calibration, but apart from that no additional functionality for maintenance. No way to delete the calibration results, export or import them, or even manually override.&lt;br /&gt;&lt;br /&gt;If you want to get an understanding what this means, have a look at the MOS document "Automatic Degree of Parallelism in 11.2.0.2 [ID 1269321.1]" that, besides stating that there can be problems with the actual I/O calibration like gathering unreasonable values or not running to completion, shows you how to manipulate an internal SYS table to override the values gathered which also requires to bounce the instance in order to become effective.&lt;br /&gt;&lt;br /&gt;I find it hard to understand why Oracle hasn't address these handling shortcomings in the meantime, particularly given the fact that with Oracle 11.2.0.2 the I/O resource calibration becomes mandatory if you want to make use of the new Auto-DOP feature that has been introduced with 11.2.0.1. Fiddling with a SYS-owned table doesn't sound like a well-designed feature to me, and the calibration functionality is not exactly "brand-new".&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;5. Oracle 11.2.0.2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So starting with 11.2.0.2 the new "cost is time" calculation comes into the picture. If you have values in the corresponding SYS.RESOURCE_IO_CALIBRATE$ table (that is simply externalized by the DBA_RSRC_IO_CALIBRATE view) then something really odd happens:&lt;br /&gt;&lt;br /&gt;The cost that has been calculated according to the System Statistics model - which is already a time estimate based on three different components - time for single-block and multi-block reads as well as the estimated CPU time is now converted into a data volume by simply multiplying the resulting cost with the default block size. Dividing this data volume by the throughput as indicated by the I/O calibration results (it looks like the value MAX_PMBPS is relevant) arrives at a new estimated execution time.&lt;br /&gt;&lt;br /&gt;Let's have a look at a working example: With default NOWORKLOAD System Statistics, 8KB default block size and a unset db_file_multiblock_read_count that results in a MultiBlockReadCount (MBRC) of 8 to be used internally for calculation of a full table scan (FTS) the time estimate for a FTS of 10,000 blocks (80MB) will be based on 1,250 multi-block reads, which are estimated to take 26ms each - this gives us a time estimate of 32.5 seconds. The CPU time associated with that full table scan operation will be added on top so that the final result will be something between 32.5 and 33 seconds. Let's stick to the 32.5 seconds - this time estimate corresponds to approx. 2,710 single-block reads by simply dividing the time by 12ms which happens to be the SREADTIM value for default NOWORKLOAD System Statistics with above configuration - this value will be close to the cost shown (minor variations are depending on the CPU speed determined).&lt;br /&gt;&lt;br /&gt;Cost / time estimate for a FTS of a 10,000 block segment with 8KB block size, default NOWORKLOAD System Statistics and default MBRC of 8 used for cost calculation (_db_file_optimizer_read_count = 8):&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;-------------------------------------------------------------------&lt;br /&gt;| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |&lt;br /&gt;-------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT   |      |     1 |  2716   (1)| &lt;span style="font-weight:bold;"&gt;00:00:33&lt;/span&gt; |&lt;br /&gt;|   1 |  SORT AGGREGATE    |      |     1 |            |          |&lt;br /&gt;|   2 |   TABLE ACCESS FULL| T    | 10000 |  2716   (1)| &lt;span style="font-weight:bold;"&gt;00:00:33&lt;/span&gt; |&lt;br /&gt;-------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Now if you happen to have a value of MAX_PMBPS of 4MB/sec as I/O Resource Calibration result (I chose this very conservative value deliberately because it happens to be the same transfer rate that the default NOWORKLOAD System Statistics assumes (4096 bytes per millisec), the following new time calculation will happen instead: &lt;br /&gt;&lt;br /&gt;2,710 will be multiplied with the 8KB default block size to arrive at a data volume, in this case approx. 21 MB&lt;br /&gt;&lt;br /&gt;This approx. 21 MB is now divided by the 4MB/sec, to arrive at a new time estimate of approx. 5.3 seconds, rounded up to 6 seconds. Note that the original time estimate was 32.5 seconds.&lt;br /&gt;&lt;br /&gt;Cost / time estimate for a FTS of a 10,000 block segment with 8KB block size, default NOWORKLOAD System Statistics and default MBRC of 8 used for cost calculation (_db_file_optimizer_read_count = 8) but MAX_PMBPS set to 4MB/sec:&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;-------------------------------------------------------------------&lt;br /&gt;| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |&lt;br /&gt;-------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT   |      |     1 |  2716   (1)| &lt;span style="font-weight:bold;"&gt;00:00:06&lt;/span&gt; |&lt;br /&gt;|   1 |  SORT AGGREGATE    |      |     1 |            |          |&lt;br /&gt;|   2 |   TABLE ACCESS FULL| T    | 10000 |  2716   (1)| &lt;span style="font-weight:bold;"&gt;00:00:06&lt;/span&gt; |&lt;br /&gt;-------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;You can see this also happening in the 10053 CBO trace file:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;kkeCostToTime: using io calibrate stats &lt;br /&gt; maxmbps=0(MB/s) maxpmbps=4(MB/s) &lt;br /&gt; block_size=8192 mb_io_count=1 mb_io_size=8192 (bytes) &lt;br /&gt; tot_io_size=21(MB) time=5304(ms)&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Now this approach strikes me as odd for several reasons:&lt;br /&gt;&lt;br /&gt;- A sophisticated time estimate (remember that it even includes a CPU time component that has nothing to do with an I/O volume) is turned into data volume to arrive at a new time estimate using a rather simplistic approach&lt;br /&gt;&lt;br /&gt;- As you can see from above example the "data volume" calculated does not correspond to the actual I/O volume that we know from the System Statistics cost/time calculation - remember that the actual segment size in this case was 80MB, not 20MB. This is of course caused by the underlying calculation of the original time estimate based on multi-block reads. So why we would turn the cost/time into some data volume that has nothing to do with the actual data volume used for the original cost/time calculation is above me&lt;br /&gt;&lt;br /&gt;- There is already an I/O calibration routine available as part of the System Statistics functionality that can be used to arrive at more realistic time estimates based on the gathered System Statistics information - why a second one has been introduced? Furthermore this raises the question: If I'm required to run the I/O calibration to enable Auto-DOP - shouldn't I then also "calibrate" my System Statistics to arrive at a "calibrated" cost estimate? After all the new "Cost Is Time" approach uses the cost estimate for the new time estimate.&lt;br /&gt;&lt;br /&gt;- As already outlined there is no officially documented way to properly deal with the I/O calibration results - manually poking into SYS-owned tables doesn't really count&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Implications&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So you probably think, why bother? The cost estimate is left untouched, only the TIME column is affected. So execution plans shouldn't change since they are still chosen based on the lowest cost estimate - and the lower the cost, the lower the new time estimate.&lt;br /&gt;&lt;br /&gt;You'll appreciate however that the previous simple correlation between the cost and the time estimate is no longer true with 11.2.0.2 and resource calibration results available: So far you could simply divide the time estimate by the SREADTIM value to arrive at the cost, or the other way around, you could multiply the cost by the SREADTIM value to arrive at the time estimate - or use both values to arrive at the SREADTIM value - since the time divided by the cost should give you the approximate value of SREADTIM.&lt;br /&gt;&lt;br /&gt;The point with 11.2.0.2 and the I/O resource calibration is that the new time estimate is obviously used for the Auto-DOP feature to drive two crucial decisions:&lt;br /&gt;&lt;br /&gt;- Is the statement a candidate for parallel execution? This is controlled via the parameter PARALLEL_MIN_TIME_THRESHOLD that defaults to 10 seconds in 11.2.0.2&lt;br /&gt;&lt;br /&gt;- If it is a candidate for parallel execution what is the optimal DOP? This is of course depending on a lot of different inputs but also seems to be based on the new time estimate - that, as just explained, arrives at a (wrong) data volume estimate in a questionable way&lt;br /&gt;&lt;br /&gt;As a side note, Oracle at present recommends to set the value of MAX_PMBPS to 200 for Exadata environments rather than relying on the results of the actual I/O calibration - another indication that the I/O calibration results as of now are questionable.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Oracle introduced with the 11.2.0.2 patch set a new model for the estimated "Execution Time" if I/O resource calibration results are available. As outlined above the new approach seems to be questionable (at least), but will be used for crucial decisions regarding the new Auto-DOP feature. It will be interesting to see the further development in this area, whether for example the new time algorithm will be changed in upcoming releases or the influence of the I/O calibration on the CBO calculations will be extended.&lt;br /&gt;&lt;br /&gt;If you want to make use of the new Auto-DOP feature in 11.2.0.2 you should be aware of these relationship - the MAX_PMBPS parameter drives the new time estimation and the Auto-DOP calculations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-3530263354865713210?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/3530263354865713210/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=3530263354865713210' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/3530263354865713210'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/3530263354865713210'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/07/cost-is-time-next-generation.html' title='Cost Is Time: Next Generation'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-8374209297685890859</id><published>2011-07-25T10:01:00.002+02:00</published><updated>2011-07-25T22:08:18.891+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.5'/><category scheme='http://www.blogger.com/atom/ns#' term='Fundamentals'/><category scheme='http://www.blogger.com/atom/ns#' term='Prefetching'/><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.4'/><category scheme='http://www.blogger.com/atom/ns#' term='Nested Loop Join'/><category scheme='http://www.blogger.com/atom/ns#' term='10gR2'/><category scheme='http://www.blogger.com/atom/ns#' term='Logical I/O'/><title type='text'>Logical I/O - Evolution: Part 2 - 9i, 10g Prefetching</title><content type='html'>In the &lt;a href="http://oracle-randolf.blogspot.com/2011/07/logical-io-evolution-part-1-baseline.html"&gt;initial part&lt;/a&gt; of this series I've explained some details regarding logical I/O using a Nested Loop Join as example.&lt;br /&gt;&lt;br /&gt;To recap I've shown in particular:&lt;br /&gt;&lt;br /&gt;- Oracle can re-visit pinned buffers without performing logical I/O&lt;br /&gt;&lt;br /&gt;- There are different variants of consistent gets - a "normal" one involving buffer pin/unpin cycles requiring two latch acquisitions and a short-cut variant that visits the buffer while holding the corresponding "cache buffers chains" child latch ("examination") and therefore only requiring a single latch acquisition&lt;br /&gt;&lt;br /&gt;- Although two statements use a similar execution plan and produce the same number of logical I/Os one is significantly faster and scales better than the other one&lt;br /&gt;&lt;br /&gt;The initial part used the "classic" shape of the Nested Loop Join, but Oracle introduced in recent releases various enhancements in that area - in particular in 9i the "Table Prefetching" and in 11g the Nested Loop Join Batching using "Vector/Batched I/O".&lt;br /&gt;&lt;br /&gt;Although these enhancements have been introduced primarily to optimize the physical I/O patterns, they could also have an influence on logical I/O. &lt;br /&gt;&lt;br /&gt;The intention of Prefetching and Batching seems to be the same - they both are targeted towards the usually most expensive part of the Nested Loop Join: The random table lookup as part of the inner row source. By trying to "prefetch" or "batch" physical I/O operations caused by this random block access Oracle attempts to minimize the I/O waits. &lt;br /&gt;&lt;br /&gt;I might cover the effect on physical I/O of both "Prefetching" and "Batching" in separate posts, here I'll only mention that you might see "db file scattered read" or "db file parallel read" multi-block I/O operations instead of single block "db file sequential read" operations for the random table access with those optimizations (Index prefetching is also possible, by the way). Note also that if you see the Prefetching or Batching plan shape it does not necessarily mean that it is actually going to happen at execution time - Oracle monitors the effectiveness of the Prefetching and can dynamically decide whether it will be used or not.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;10.2.0.4 Table Prefetching - Random order&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's enable table prefetching in 10.2 and re-run the original test case. The first run will use the different order variant of T1 and T2:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Inner row source Unique Index - T1 different order than T2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:04.12 |     310K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:03.90 |     310K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T2     |      1 |    100K|  2716   (1)|    100K|00:00:00.30 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:02.71 |     300K|&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:01.20 |     200K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Inner row source Non-Unique Index - T1 different order than T2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;--------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                    | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;--------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   1 |  SORT AGGREGATE              |        |      1 |      1 |            |      1 |00:00:05.03 |     311K|&lt;br /&gt;|   2 |   TABLE ACCESS BY INDEX ROWID| T2     |      1 |      1 |     2   (0)|    100K|00:00:04.40 |     311K|&lt;br /&gt;|   3 |    NESTED LOOPS              |        |      1 |    100K|   202K  (1)|    200K|00:00:03.02 |     211K|&lt;br /&gt;|   4 |     TABLE ACCESS FULL        | T1     |      1 |    100K|  2716   (1)|    100K|00:00:00.30 |   10010 |&lt;br /&gt;|*  5 |     INDEX RANGE SCAN         | T2_IDX |    100K|      1 |     1   (0)|    100K|00:00:01.49 |     201K|&lt;br /&gt;--------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;As you'll see, in 10g even with table prefetching enabled the unique index variant does look the same and performs similar as in the original post.&lt;br /&gt;&lt;br /&gt;This changes in 11g by the way, where the unique index variant also supports the table prefetching plan shape.&lt;br /&gt;&lt;br /&gt;For the non-unique variant you'll see a different shape of the execution plan where the inner row source random table lookup is actually a parent operation to the Nested Loop Join (and hence will only be started once and consumes the information generated by the child Nested Loop operation). &lt;br /&gt;&lt;br /&gt;Note that in case of nested Nested Loop Joins only the inner-most row source will make use of the Table Prefetching shape. The same applies to the 11g Nested Loop Join Batching. If you happen to have several Nested Loops Joins that are not directly nested then each of the inner-most row sources might use the Table Prefetching/Batching shape - which means that it can be used more than once as part of a single execution plan.&lt;br /&gt;&lt;br /&gt;If you compare the Runtime profile of the non-unique index variant with the original Runtime profile without Table Prefetching you'll not see any difference in terms of logical I/O, however it becomes obvious that the overall execution is actually slightly faster (more significant with row source sampling overhead enabled). In particular the random table access requires significantly less time than in the original Runtime profile, so it seems to be more efficient, although it is still slower than the unique index variant.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Begin Update&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Having focused on the logical I/O I completely forgot to mention the inconsistency in the A-Rows column (thanks to Flado who pointed this out in his comment below), which shows 200K rows for the Nested Loop operation although only 100K rows have been identified in the inner index lookup. I believe this is an inconsistency that also shows up when performing an SQL trace so it seems to be a problem with the row source statistics. In principle with this plan shape the Nested Loop Join operation seems to account for the sum of both the rows identified in the driving row source &lt;span style="font-weight:bold;"&gt;and&lt;/span&gt; the inner index lookup, rather than the expected number of rows identified in the inner index lookup only. &lt;br /&gt;&lt;br /&gt;However, as mentioned below in the "Statistics" section there is another anomaly - a consistent get &lt;span style="font-weight:bold;"&gt;and&lt;/span&gt; "buffer is pinned count" for every row looked up in the inner table, so this might not be just coincidence but another indicator that there is really some excess work happening with Table Prefetching. &lt;br /&gt;&lt;br /&gt;By the way - both anomalies are still present in 11.1 / 11.2 when using Table Prefetching there.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;End Update&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's have a look at the session statistics.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..table scan blocks gotten                          10,000      10,000           0&lt;br /&gt;STAT..table scan rows gotten                           100,000     100,000           0&lt;br /&gt;STAT..table fetch by rowid                             100,000     100,002           2&lt;br /&gt;STAT..consistent gets                                  310,012     311,101       1,089&lt;br /&gt;STAT..consistent gets from cache                       310,012     311,101       1,089&lt;br /&gt;STAT..session logical reads                            310,012     311,101       1,089&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..rows fetched via callback                        100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;STAT..buffer is not pinned count                       200,001      99,997    -100,004&lt;br /&gt;STAT..buffer is pinned count                           189,998     290,006     100,008&lt;br /&gt;STAT..consistent gets - examination                    300,001     100,007    -199,994&lt;br /&gt;STAT..no work - consistent read gets                    10,001     211,084     201,083&lt;br /&gt;LATCH.cache buffers chains                             320,031     522,195     202,164&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So the only significant difference in this case is the increased "buffer is pinned count" / decreased "buffer is not pinned" count statistics, although the number of logical I/O stays the same. I don't know if this really means excess work with Table Prefetching enabled or whether this is an instrumentation problem. Nevertheless with Table Prefetching enabled in this case you'll end up with both a "buffer is pinned count" &lt;span style="font-weight:bold;"&gt;and&lt;/span&gt; "consistent get" for each row looked up in the inner row source table operation. The number of logical I/O and latch acquisitions stays the same, so it's not obvious from the statistics why this performs better than the non-Table Prefetching case - according to the statistics it even performs more work, but may be the table random access as parent operation to the Nested Loop allows a more efficient processing requiring less CPU cycles.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;10.2.0.4 Table Prefetching - Same (Random) order&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's change the data order and use either the same "Pseudo-Random" order (by uncommenting the second "dbms_random.seed(0)" call) or order by ID - it doesn't matter with Table Prefetching in 10g.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Inner row source Unique Index - T1 and T2 same order&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:03.91 |     310K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:03.70 |     310K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T2     |      1 |    100K|  2716   (1)|    100K|00:00:00.30 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:02.54 |     300K|&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:01.14 |     200K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Inner row source Non-Unique Index - T1 and T2 same order&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;--------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                    | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;--------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   1 |  SORT AGGREGATE              |        |      1 |      1 |            |      1 |00:00:04.54 |     221K|&lt;br /&gt;|   2 |   TABLE ACCESS BY INDEX ROWID| T2     |      1 |      1 |     2   (0)|    100K|00:00:03.90 |     221K|&lt;br /&gt;|   3 |    NESTED LOOPS              |        |      1 |    100K|   202K  (1)|    200K|00:00:02.82 |     211K|&lt;br /&gt;|   4 |     TABLE ACCESS FULL        | T1     |      1 |    100K|  2716   (1)|    100K|00:00:00.30 |   10010 |&lt;br /&gt;|*  5 |     INDEX RANGE SCAN         | T2_IDX |    100K|      1 |     1   (0)|    100K|00:00:01.40 |     201K|&lt;br /&gt;--------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Now we really see a difference: The unique index variant still shows the same results, but the non-unique variant saves logical I/O on the random table access - and is faster than with random order - coming closer to the unique index variant performance.&lt;br /&gt;&lt;br /&gt;Whereas the index range scan still requires approx. 200,000 logical I/Os the random table access only requires 10,000 logical I/Os instead of 100,000.&lt;br /&gt;&lt;br /&gt;The session statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..table scan blocks gotten                          10,000      10,000           0&lt;br /&gt;STAT..table scan rows gotten                           100,000     100,000           0&lt;br /&gt;STAT..table fetch by rowid                             100,000     100,002           2&lt;br /&gt;LATCH.cache buffers chains                             320,023     342,213      22,190&lt;br /&gt;STAT..consistent gets                                  310,012     221,110     -88,902&lt;br /&gt;STAT..consistent gets from cache                       310,012     221,110     -88,902&lt;br /&gt;STAT..session logical reads                            310,012     221,110     -88,902&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..rows fetched via callback                        100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;STAT..no work - consistent read gets                    10,001     121,093     111,092&lt;br /&gt;STAT..buffer is not pinned count                       200,001      10,006    -189,995&lt;br /&gt;STAT..buffer is pinned count                           189,998     379,997     189,999&lt;br /&gt;STAT..consistent gets - examination                    300,001     100,007    -199,994&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The session statistics confirm this: The "buffer is pinned count" increases by another 90,000 for the non-unique index variant which corresponds to the 90,000 logical I/Os performed less as part of the random table access operation.&lt;br /&gt;&lt;br /&gt;The number of latch acquisitions decreases accordingly so that we end up with a comparable number as with the unique index variant.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Scalability&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you run the non-unique index Table Prefetching variant with the concurrent execution test harness you'll see a corresponding slightly increased scalability although it still scales not as good as the unique index variant.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Table Prefetching has been introduced in Oracle 9i in order to optimize the random physical access in Nested Loop Joins, however it also seems to have a positive effect on logical I/O. The effectiveness of this optimization depends on the data order - if the data from the driving row source is in the same order as the inner row source table buffers can be kept pinned. Note that the same doesn't apply to the index lookup - even if the data is ordered by ID and consequently the same index branch and leaf blocks will be accessed repeatedly with each iteration, a buffer pinning optimization could not be observed.&lt;br /&gt;&lt;br /&gt;In the next part we'll see what happens with this example in Oracle 11g and its new features.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-8374209297685890859?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/8374209297685890859/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=8374209297685890859' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/8374209297685890859'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/8374209297685890859'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/07/logical-io-evolution-part-2-9i-10g.html' title='Logical I/O - Evolution: Part 2 - 9i, 10g Prefetching'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-5036748389197734552</id><published>2011-07-07T22:55:00.005+02:00</published><updated>2011-07-25T11:04:36.523+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.5'/><category scheme='http://www.blogger.com/atom/ns#' term='Fundamentals'/><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.4'/><category scheme='http://www.blogger.com/atom/ns#' term='Nested Loop Join'/><category scheme='http://www.blogger.com/atom/ns#' term='10gR2'/><category scheme='http://www.blogger.com/atom/ns#' term='Logical I/O'/><title type='text'>Logical I/O - Evolution: Part 1 - Baseline</title><content type='html'>&lt;a href="http://oracle-randolf.blogspot.com/2011/07/logical-io-evolution-part-2-9i-10g.html"&gt;Forward to Part 2&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This is the first part in a series of blog posts that shed some light on the enhancements Oracle has introduced with the recent releases regarding the optimizations of logical I/O.http://www.blogger.com/img/blank.gif&lt;br /&gt;&lt;br /&gt;Before we can appreciate the enhancements, though, we need to understand the baseline. This is what this blog post is about.&lt;br /&gt;&lt;br /&gt;The example used throughout this post is based on a simple Nested Loop Join which is one area where Oracle has introduced significant enhancements.&lt;br /&gt;&lt;br /&gt;It started its life as a comparison of using unique vs. non-unique indexes as part of a Nested Loop Join and their influence on performance and scalability.&lt;br /&gt;&lt;br /&gt;This comparison on its own is very educating and also allows to demonstrate and explain some of the little details regarding logical I/O.&lt;br /&gt;&lt;br /&gt;Here is the basic script that gets used. It creates two tables with a primary defined, one table using a unique index, the other one a non-unique index.&lt;br /&gt;&lt;br /&gt;The tables are specifically crafted to have exactly 100,000 rows with 10 rows per block resulting in 10,000 blocks (using the MINIMIZE RECORDS_PER_BLOCK option). These "obvious" numbers hopefully allow for nice pattern recognition in the resulting figures. Using the default 8K block size the resulting indexes will have slightly more than 1,000 blocks.&lt;br /&gt;&lt;br /&gt;It will run then a Nested Loop Join from one table to the other and then the other way around along with a snapshot of the session statistics using Adrian Billington's RUNSTATS package which is based on Tom Kyte's well known package of the same name. You can get it from &lt;a href="http://www.oracle-developer.net/content/utilities/runstats.zip"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If you run this against 9i to 10.2 you'll need to disable table prefetching to get the results explained here. This can only be done by setting the static parameter "_table_lookup_prefetch_size" equal to 0 which requires to restart the instance.&lt;br /&gt;&lt;br /&gt;11g allows to control the behaviour via various hints and parameters, see the script for more details.&lt;br /&gt;&lt;br /&gt;In order to be in line with the baseline explanations presented here this should be executed against pre-11g since 11g introduces some significant changes that will be covered in upcoming posts.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;--&lt;br /&gt;-- File name:   unique_non_unique_index_difference.sql&lt;br /&gt;--&lt;br /&gt;-- Purpose:     Compare the efficiency of NESTED LOOP joins via index lookup&lt;br /&gt;--              between unique and non-unique indexes&lt;br /&gt;--&lt;br /&gt;-- Author:      Randolf Geist http://oracle-randolf.blogspot.com&lt;br /&gt;--&lt;br /&gt;-- Prereqs:     RUNSTATS_PKG by Adrian Billington / Tom Kyte&lt;br /&gt;--&lt;br /&gt;-- Last tested: June 2011&lt;br /&gt;--&lt;br /&gt;-- Versions:    10.2.0.4&lt;br /&gt;--              10.2.0.5&lt;br /&gt;--              11.1.0.7&lt;br /&gt;--              11.2.0.1&lt;br /&gt;--              11.2.0.2&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;set echo on timing on linesize 130 pagesize 999 trimspool on tab off serveroutput off doc on&lt;br /&gt;&lt;br /&gt;doc&lt;br /&gt;From 9i to 10.2 you need to disable table prefetching &lt;br /&gt;to get the "original" version of NL joins&lt;br /&gt;&lt;br /&gt;-- Disable table prefetchting&lt;br /&gt;alter system set "_table_lookup_prefetch_size" = 0 scope = spfile;&lt;br /&gt;&lt;br /&gt;-- Back to defaults&lt;br /&gt;alter system reset "_table_lookup_prefetch_size" scope = spfile sid = '*';&lt;br /&gt;&lt;br /&gt;From 11g on this can handled via the nlj_prefetch and nlj_batching hints&lt;br /&gt;&lt;br /&gt;But they work a bit counterintuitive when combined therefore &lt;br /&gt;&lt;br /&gt;opt_param('_nlj_batching_enabled', 0)&lt;br /&gt;&lt;br /&gt;is also required to get exactly the NL join optimization requested&lt;br /&gt;&lt;br /&gt;Since this is about logical I/O, not physical I/O you need sufficient cache &lt;br /&gt;defined (256M should be fine) otherwise the results will differ &lt;br /&gt;when physical I/O happens&lt;br /&gt;#&lt;br /&gt;&lt;br /&gt;spool unique_non_unique_index_difference.log&lt;br /&gt;&lt;br /&gt;drop table t1;&lt;br /&gt;&lt;br /&gt;purge table t1;&lt;br /&gt;&lt;br /&gt;exec dbms_random.seed(0)&lt;br /&gt;&lt;br /&gt;-- Random order&lt;br /&gt;-- Create 10 rows in a single block&lt;br /&gt;create table t1&lt;br /&gt;--pctfree 0&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;         rownum as id&lt;br /&gt;       , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;         dual&lt;br /&gt;connect by&lt;br /&gt;         level &amp;lt;= 10&lt;br /&gt;order by&lt;br /&gt;--         id&lt;br /&gt;         dbms_random.value&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;-- Tell Oracle to store at most 10 rows per block&lt;br /&gt;alter table t1 minimize records_per_block;&lt;br /&gt;&lt;br /&gt;truncate table t1;&lt;br /&gt;&lt;br /&gt;-- Populate the table, resulting in exactly 10,000 blocks with MSSM&lt;br /&gt;insert /*+ append */ into t1&lt;br /&gt;select&lt;br /&gt;         rownum as id&lt;br /&gt;       , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;         dual&lt;br /&gt;connect by&lt;br /&gt;         level &amp;lt;= 100000&lt;br /&gt;order by&lt;br /&gt;--         id&lt;br /&gt;         dbms_random.value&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;-- Force BLEVEL 2 for UNIQUE index (with 8K blocks, root-&amp;gt;branch-&amp;gt;leaf)&lt;br /&gt;create unique index t1_idx on t1 (id) pctfree 80;&lt;br /&gt;&lt;br /&gt;-- Avoid any side effects of dynamic sampling&lt;br /&gt;-- (and perform delayed block cleanout when not using direct-path load)&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't1', estimate_percent =&amp;gt; null)&lt;br /&gt;&lt;br /&gt;-- Add PK constraint&lt;br /&gt;alter table t1 add constraint t1_pk primary key (id);&lt;br /&gt;&lt;br /&gt;drop table t2;&lt;br /&gt;&lt;br /&gt;purge table t2;&lt;br /&gt;&lt;br /&gt;-- exec dbms_random.seed(0)&lt;br /&gt;&lt;br /&gt;-- Random order (but different from T1 order)&lt;br /&gt;-- Create 10 rows in a single block&lt;br /&gt;create table t2&lt;br /&gt;--pctfree 0&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;         rownum as id&lt;br /&gt;       , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;         dual&lt;br /&gt;connect by&lt;br /&gt;         level &amp;lt;= 10&lt;br /&gt;order by&lt;br /&gt;--         id&lt;br /&gt;         dbms_random.value&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;-- Tell Oracle to store at most 10 rows per block&lt;br /&gt;alter table t2 minimize records_per_block;&lt;br /&gt;&lt;br /&gt;truncate table t2;&lt;br /&gt;&lt;br /&gt;-- Populate the table, resulting in exactly 10,000 blocks with MSSM&lt;br /&gt;insert /*+ append */ into t2&lt;br /&gt;select&lt;br /&gt;         rownum as id&lt;br /&gt;       , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;         dual&lt;br /&gt;connect by&lt;br /&gt;         level &amp;lt;= 100000&lt;br /&gt;order by&lt;br /&gt;--         id&lt;br /&gt;         dbms_random.value&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;-- Force BLEVEL 2 for NON-UNIQUE index (with 8K blocks, root-&amp;gt;branch-&amp;gt;leaf)&lt;br /&gt;create index t2_idx on t2 (id) pctfree 80;&lt;br /&gt;&lt;br /&gt;-- Avoid any side effects of dynamic sampling&lt;br /&gt;-- (and perform delayed block cleanout when not using direct-path load)&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't2', estimate_percent =&amp;gt; null)&lt;br /&gt;&lt;br /&gt;-- Add PK constraint based on non-unique index&lt;br /&gt;alter table t2 add constraint t2_pk primary key (id);&lt;br /&gt;&lt;br /&gt;alter session set statistics_level = all;&lt;br /&gt;&lt;br /&gt;-- Run the commands once to cache the blocks and get a runtime profile&lt;br /&gt;select&lt;br /&gt;       max(b_filler), max(a_filler)&lt;br /&gt;from (&lt;br /&gt;select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */&lt;br /&gt;       a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler&lt;br /&gt;from&lt;br /&gt;       t2 a&lt;br /&gt;     , t1 b&lt;br /&gt;where&lt;br /&gt;       a.id = b.id&lt;br /&gt;);&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null, '+COST +OUTLINE ALLSTATS LAST'));&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;       max(b_filler), max(a_filler)&lt;br /&gt;from (&lt;br /&gt;select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */&lt;br /&gt;       a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler&lt;br /&gt;from&lt;br /&gt;       t1 a&lt;br /&gt;     , t2 b&lt;br /&gt;where&lt;br /&gt;       a.id = b.id&lt;br /&gt;);&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display_cursor(null, null, '+COST +OUTLINE ALLSTATS LAST'));&lt;br /&gt;&lt;br /&gt;-- Eliminate row source statistics overhead&lt;br /&gt;-- for the "real" test&lt;br /&gt;alter session set statistics_level = typical;&lt;br /&gt;&lt;br /&gt;exec runstats_pkg.rs_start&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;       max(b_filler), max(a_filler)&lt;br /&gt;from (&lt;br /&gt;select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */&lt;br /&gt;       a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler&lt;br /&gt;from&lt;br /&gt;       t2 a&lt;br /&gt;     , t1 b&lt;br /&gt;where&lt;br /&gt;       a.id = b.id&lt;br /&gt;);&lt;br /&gt;&lt;br /&gt;exec runstats_pkg.rs_middle&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;       max(b_filler), max(a_filler)&lt;br /&gt;from (&lt;br /&gt;select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */&lt;br /&gt;       a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler&lt;br /&gt;from&lt;br /&gt;       t1 a&lt;br /&gt;     , t2 b&lt;br /&gt;where&lt;br /&gt;       a.id = b.id&lt;br /&gt;);&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;set serveroutput on&lt;br /&gt;&lt;br /&gt;exec runstats_pkg.rs_stop(-1)&lt;br /&gt;&lt;br /&gt;spool off&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Expectations&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What are the expected results - here explained based on 10.2? This is a nested loop join of 100,000 rows to 100,000 rows table allocating each 10,000 blocks. The inner row source will be using the available index and perform a table lookup by ROWID 100,000 times.&lt;br /&gt;&lt;br /&gt;The script specifically crafts the indexes to have a height of 3 (BLEVEL = 2) when using a default block size of 8K which means that they have a root block with a number of branch blocks on the second level and finally the leaf blocks on the third level. Note that different block sizes can lead to different index heights and therefore different results.&lt;br /&gt;&lt;br /&gt;In terms of "buffer visits" required to complete the statement we could think of the following:&lt;br /&gt;&lt;br /&gt;- 100,000 block visits for the outer row source running a simple full table scan. For every iteration of the loop we need to visit the buffer and read the next row that will be used for the lookup into the inner row source&lt;br /&gt;&lt;br /&gt;- 300,000 block visits for the inner row source index lookup, since for every index lookup we need to traverse the index from root to branch to leaf&lt;br /&gt;&lt;br /&gt;- 100,000 block visits for the inner row source table lookup by ROWID&lt;br /&gt;&lt;br /&gt;So according to this model in total we need to "explain" 500,000 block visits for this example.&lt;br /&gt;&lt;br /&gt;Let's have a look at the various results from the script.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;1. The runtime profile&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;a) Running the Nested Loop Join using the "Unique Index" inner row source&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;Plan hash value: 3952364803&lt;br /&gt;&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:03.95 |     310K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:03.70 |     310K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T2     |      1 |    100K|  2716   (1)|    100K|00:00:00.30 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T1     |    100K|      1 |     2   (0)|    100K|00:00:02.59 |     300K|&lt;br /&gt;|*  5 |     INDEX UNIQUE SCAN         | T1_IDX |    100K|      1 |     1   (0)|    100K|00:00:01.14 |     200K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;If everything went as expected you'll see here the "classic" shape of a Nested Loop Join using an index lookup for the inner row source. The loop is driven by the full table scan of the T2 table and for every row produced by that row source the inner row source will be examined starting with an index unique scan in this case followed by an table access by ROWID for those rows found in the index.&lt;br /&gt;&lt;br /&gt;Comparing the runtime profile to the model described above one significant difference becomes immediately obvious: The profile only shows 310,000 logical I/Os, not 500,000. So either above model is incorrect or Oracle has introduced some "short-cuts" that allow to avoid approx. 190,000 out of 500,000 logical I/Os. The difference of 190,000 seems to come from the index unique scan which only reports 200,000 logical I/Os instead of the expected 300,000 and the full table scan of T2 driving the nested loop. It reports only 10,000 logical I/Os instead of the 100,000 pictured above. More on those differences in a moment.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;b) Running the Nested Loop Join using the "Non-Unique Index" inner row source&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;Plan hash value: 537985513&lt;br /&gt;&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;|   1 |  SORT AGGREGATE               |        |      1 |      1 |            |      1 |00:00:06.31 |     311K|&lt;br /&gt;|   2 |   NESTED LOOPS                |        |      1 |    100K|   202K  (1)|    100K|00:00:06.10 |     311K|&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | T1     |      1 |    100K|  2716   (1)|    100K|00:00:00.30 |   10010 |&lt;br /&gt;|   4 |    TABLE ACCESS BY INDEX ROWID| T2     |    100K|      1 |     2   (0)|    100K|00:00:04.91 |     301K|&lt;br /&gt;|*  5 |     INDEX RANGE SCAN          | T2_IDX |    100K|      1 |     1   (0)|    100K|00:00:01.77 |     201K|&lt;br /&gt;---------------------------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This is very interesting: First of all the cost calculation is the same, so in terms of costs estimates of the optimizer there is no difference between the unique and non-unique case.&lt;br /&gt;&lt;br /&gt;However the runtime is significantly different: The non-unique variant is consistently slower than the unique variant.&lt;br /&gt;&lt;br /&gt;Furthermore, another minor difference is a slightly increased number of logical I/Os that seems to be caused by the INDEX RANGE SCAN operation (201K vs. 200K).&lt;br /&gt;&lt;br /&gt;Why this? Although we have defined a non-deferrable primary key constraint that guarantees uniqueness Oracle still searches in case of an index range scan for the next index entry that does not satisfy the access predicate, which means that for every iteration of the loop Oracle looks at the next index entry to check if it still satisfies the predicate or not. This means in case of the last index entry in each leaf block it has to actually check the next leaf block's first entry for this comparison, hence we end up with approx. number of index leaf blocks more logical I/O in this case. It is also the first part of the explanation why Oracle has to perform more work for the non-unique variant. From the runtime profile however we can tell that although we lose time at the index range scan vs. index unique scan operation, we lose even more time at the table access by ROWID operation. &lt;br /&gt;&lt;br /&gt;Remember for a better understanding that the A-TIME and Buffer columns are cumulative - every parent operation includes the child operation runtime/logical I/Os, so in order to understand the runtime/logical I/Os of an operation itself you need to subtract the values taken from the direct descendant operation(s).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;2. Session Statistics&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's have a look at the relevant session statistics:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;Statistics Name                                         Unique  Non-Unique  Difference&lt;br /&gt;----------------------------------------------------- -------- ----------- -----------&lt;br /&gt;STAT..buffer is pinned count                           189,998     189,998           0&lt;br /&gt;STAT..table scan blocks gotten                          10,000      10,000           0&lt;br /&gt;STAT..table scan rows gotten                           100,000     100,000           0&lt;br /&gt;STAT..table fetch by rowid                             100,000     100,002           2&lt;br /&gt;STAT..buffer is not pinned count                       200,001     200,005           4&lt;br /&gt;STAT..consistent gets                                  310,012     311,110       1,098&lt;br /&gt;STAT..consistent gets from cache                       310,012     311,110       1,098&lt;br /&gt;STAT..session logical reads                            310,012     311,110       1,098&lt;br /&gt;STAT..index fetch by key                               100,000           2     -99,998&lt;br /&gt;STAT..rows fetched via callback                        100,000           2     -99,998&lt;br /&gt;STAT..index scans kdiixs1                                    0     100,000     100,000&lt;br /&gt;STAT..consistent gets - examination                    300,001     100,007    -199,994&lt;br /&gt;STAT..no work - consistent read gets                    10,001     211,093     201,092&lt;br /&gt;LATCH.cache buffers chains                             320,023     522,213     202,190&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;a) Pinned Buffers&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now isn't that an interesting coincidence? May be not. The "buffer is pinned count" statistics quite nicely matches the missing 190,000 buffer visits. So Oracle managed to keep 190,000 times a &lt;a href="http://www.jlcomp.demon.co.uk/buffer_handles.html"&gt;buffer pinned&lt;/a&gt; instead of re-locating it in the buffer cache by hashing the database block address to find the corresponding hash bucket, grabbing the corresponding "cache buffers chains" child latch and so on.&lt;br /&gt;&lt;br /&gt;Which buffers does Oracle keep pinned? Further modifications of the test case and investigating logical I/O details using events 10200/10202 allows to draw the conclusion that Oracle keeps the buffers of the driving table T2 pinned and the root block of the index. Pinning the root block of the index is a good idea in particular since it saves one logical I/O per loop iteration and the index root block is also quite unlikely to change frequently.&lt;br /&gt;&lt;br /&gt;Why does Oracle not simply keep all of the buffers pinned rather than going through the hash/latch/pin exercise again and again? Very likely for various scalability/concurrency reasons, for example:&lt;br /&gt;&lt;br /&gt;- A pinned buffer can not be removed/replaced even if it was eligible according to the LRU logic, hence it potentially prevents other buffers from being cached&lt;br /&gt;&lt;br /&gt;- A pinned buffer can not be accessed by other sessions that want to pin it in incompatible mode (exclusive vs. shared), although multiple sessions can pin it concurrently in compatible mode (shared). Either those sessions have to queue behind (that is what a "buffer busy wait" is about) or they may be able to create a clone copy of the block and continue their work on the clone copy. Although the "clone copy" trick is a nice one, it is undesirable for several reasons:&lt;br /&gt;&lt;br /&gt;  * The "clone" copies require each a buffer from the cache effectively reducing the number of different blocks that can be held in the buffer cache. They are also the reason why an object might require much more cache than its original segment size in order to stay &lt;a href="http://jonathanlewis.wordpress.com/2011/03/14/buffer-states/"&gt;completely in the cache&lt;/a&gt;.&lt;br /&gt;   &lt;br /&gt;  * They increase the "length" of the "cache buffers chains" leading to longer search times for blocks when locating the buffer in the cache and holding the "cache buffers chains" latch while doing so, hence increasing the potential for latch contention &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;So here is an important point&lt;/span&gt;: If you want to understand the work Oracle has performed in terms of buffer visits you need to consider both, the number of logical I/Os as well as the number of buffers visited without involving logical I/O - this is represented by the "buffer is pinned count" statistics.&lt;br /&gt;&lt;br /&gt;Quite often this fact is overlooked and people only focus on the logical I/Os - which is not unreasonable - but misses the point about pinned buffers re-visited without doing logical I/O.&lt;br /&gt;&lt;br /&gt;Note that buffer pinning is not possible across fetch calls - if the control is returned to the client the buffers will no longer be kept pinned. This is the explanation why a the "fetchsize" or "arraysize" for bulk fetches can influence the number of logical I/Os required to process a result set.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;b) "consistent gets - examination"&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There is another significant difference between the two runs that explains most of the remaining runtime difference between the unique and non-unique index variant: The unique index variant performs approx. 310,000 logical I/Os quite similar to the non-unique index variant, however it grabs the corresponding "cache buffers chains" child latch only 320,000 times vs. 520,000 times for the non-unique index.&lt;br /&gt;&lt;br /&gt;How is that possible? The explanation can be found in the statistics: The Nested Loop Join when dealing with a unique index performs all logical I/Os as part of the inner row source as "short-cut" consistent gets, which are called "consistent gets - examination". Oracle uses this shortcut whenever it knows that the block visit will be of very short duration. Oracle knows that in this particular case because the unique index guarantees that there will be at most one matching row in the index structure as well as when doing the subsequent table row lookup. So there is no need to perform a "range" scan on the index, and it is guaranteed that only one single row per iteration can be returned from the index unique scan for the table lookup by ROWID.&lt;br /&gt;&lt;br /&gt;Hence Oracle makes use of this knowledge and works on the buffer contents while holding the latch, this is what the "consistent gets - examination" statistics is about. A "normal" consistent get grabs the latch initially and releases it after having the buffer pinned. It works then on the buffer and afterwards "unpins" the buffer which requires another latch acquisition. Therefore a "non-shortcut" consistent get requires two latch acquisitions per logical I/O. This explains why we have 10,000 non-shortcut consistent gets for the driving full table scan (that are accompanied by 90,000 buffer visits avoiding logical I/O by keeping the buffer pinned) resulting in 20,000 latch acquisitions and 300,000 latch acquisitions for the remaining 300,000 "short-cut" consistent gets which makes in total 320,000 latch acquisitions for the unique index variant.&lt;br /&gt;&lt;br /&gt;The non-unique index variant performs 200,000 "non-shortcut" logical I/Os on the inner index and the table lookup, responsible for 400,000 latch acquisitions, another 10,000 for the driving table full table scan (this part is not different from the unique index variant) good for another 20,000 latch acquisitions. But it also performs 100,000 "short-cut" consistent gets, resulting in the remaining 100,000 latch acquisitions. Modifying the test case by creating the index with a height of 2 (BLEVEL = 1) shows that Oracle uses the "short-cut" consistent gets on the branch blocks of the index, so this is another area where Oracle makes use of the "optimized" version of logical I/O even with the non-unique index variant.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Scalability&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What do these subtle differences mean in terms of scalability? Well, you can download a &lt;a href="http://www.sqltools-plusplus.org:7676/media/concurrent_unique_non_unique_execution.zip"&gt;bunch of scripts&lt;/a&gt; that allow to run the same test case with as many sessions concurrently as desired. It will show that there is a significant difference between the two cases: The unique index variant not only is faster in "single-user" mode but also scales much better than the non-unique index variant when performing the test concurrently (and completely cached viz. purely logical I/O based). The provided test suite could be modified to use a more realistic scenario that runs the statement multiple times in each concurrent session with a random sleep in between, but that is left as an exercise for the interested reader.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The baseline results show that the Oracle uses many built-in features to optimize logical I/O by either avoiding the logical I/O at all or using "short-cut" versions of logical I/O where applicable.&lt;br /&gt;&lt;br /&gt;These optimizations allow the "unique index" variant to perform significantly better than the "non-unique index" variant of this particular test case. Note that this significant difference is only that significant when dealing with the pure logical I/O variant - introducing physical I/O make the difference far less impressive since the majority of the time is then spent on physical I/O, not logical I/O.&lt;br /&gt;&lt;br /&gt;In the upcoming parts of this series I'll focus on further enhancements introduced in the recent releases like table prefetching, Nested Loop Join batching aka. as I/O batching and an obviously significantly enhanced buffer pinning mechanism introduced in Oracle 11g.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Footnote&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you study the script carefully, then you'll notice that it allows for different ordering of the data - it could be randomly ordered, randomly ordered but the same (pseudo-random) order for T1 and T2 and it could be ordered by ID.&lt;br /&gt;&lt;br /&gt;If you run the test with this different ordering of data you'll notice no difference in the results with 10g (and table prefetching disabled), but it might give a clue where this will be heading for in the upcoming posts.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://oracle-randolf.blogspot.com/2011/07/logical-io-evolution-part-2-9i-10g.html"&gt;Forward to Part 2&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-5036748389197734552?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/5036748389197734552/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=5036748389197734552' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/5036748389197734552'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/5036748389197734552'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/07/logical-io-evolution-part-1-baseline.html' title='Logical I/O - Evolution: Part 1 - Baseline'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-7091118291707250176</id><published>2011-06-29T16:25:00.004+02:00</published><updated>2011-06-29T18:05:41.472+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='bug'/><category scheme='http://www.blogger.com/atom/ns#' term='dynamic sampling'/><title type='text'>Dynamic Sampling - Public Synonyms and 11.2.0.2</title><content type='html'>This is just a short heads-up note regarding a bug that obviously has been introduced with 11.2.0.2: If you happen to have a public synonym for a table that is called differently than the original object then dynamic sampling will not work in 11.2.0.2.&lt;br /&gt;&lt;br /&gt;The reason is that the generated query used for the dynamic sampling does not resolve the synonym name properly - it resolves the object owner but uses the synonym name instead of the actual table name. The same issue happens by the way when using a private synonym, however the query is then still valid and works even when using the synonym name.&lt;br /&gt;&lt;br /&gt;The bug can only be reproduced in 11.2.0.2, in all previous versions including 11.2.0.1 the synonym resolution seems to work as expected for the dynamic sampling query, so it seems to be a problem introduced in that patch set.&lt;br /&gt;&lt;br /&gt;Although the bug is quite obvious and can be nasty, a quick search on MOS didn't reveal anything suitable. Neither I could see that a corresponding bugfix was already included in one of the available PSUs on top of 11.2.0.2.&lt;br /&gt;&lt;br /&gt;Here is a simple testcase for reproducibility:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;--&lt;br /&gt;-- File name:   dynamic_sampling_public_synonym_testcase.sql&lt;br /&gt;--&lt;br /&gt;-- Purpose:     11.2.0.2 fails to run a dynamic sampling query&lt;br /&gt;--              if the original query uses a public synonym&lt;br /&gt;--              that is called differently than the original object&lt;br /&gt;--&lt;br /&gt;--              The problem can be seen in the 10053 trace file:&lt;br /&gt;--              The synonym is not properly resolved, hence the&lt;br /&gt;--              recursive query fails silently with an ORA-00942 error&lt;br /&gt;--&lt;br /&gt;-- Author:      Randolf Geist http://oracle-randolf.blogspot.com&lt;br /&gt;--&lt;br /&gt;-- Last tested: June 2011&lt;br /&gt;--&lt;br /&gt;-- Versions:    10.2.0.4&lt;br /&gt;--              10.2.0.5&lt;br /&gt;--              11.1.0.7&lt;br /&gt;--              11.2.0.1&lt;br /&gt;--              11.2.0.2&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;set echo on timing on linesize 200 trimspool on tab off pagesize 99&lt;br /&gt;&lt;br /&gt;drop table t;&lt;br /&gt;&lt;br /&gt;purge table t;&lt;br /&gt;&lt;br /&gt;drop public synonym t_synonym;&lt;br /&gt;&lt;br /&gt;create table t&lt;br /&gt;pctfree 99&lt;br /&gt;pctused 1&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;      , rownum as id2&lt;br /&gt;      , rpad('x', 500) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 10000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't')&lt;br /&gt;&lt;br /&gt;create public synonym t_synonym for t;&lt;br /&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select /*+ dynamic_sampling(4) */ * from t where id = id2;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display);&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'dynamic_sampling_public_synonym';&lt;br /&gt;&lt;br /&gt;alter session set events '10053 trace name context forever, level 1';&lt;br /&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select /*+ dynamic_sampling(4) */ * from t_synonym where id = id2;&lt;br /&gt;&lt;br /&gt;alter session set events '10053 trace name context off';&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display);&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The last EXPLAIN PLAN does not use dynamic sampling in 11.2.0.2 hence comes up with an incorrect cardinality estimate. In previous versions this works as expected. The 10053 trace file shows the incorrect recursive query.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-7091118291707250176?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/7091118291707250176/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=7091118291707250176' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/7091118291707250176'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/7091118291707250176'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/06/dynamic-sampling-public-synonyms-and.html' title='Dynamic Sampling - Public Synonyms and 11.2.0.2'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-1841605359948618507</id><published>2011-06-08T21:20:00.005+02:00</published><updated>2011-06-18T21:27:08.794+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.5'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.4'/><category scheme='http://www.blogger.com/atom/ns#' term='CBO'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='11.1.0.7'/><category scheme='http://www.blogger.com/atom/ns#' term='SCN'/><category scheme='http://www.blogger.com/atom/ns#' term='Flashback Query'/><category scheme='http://www.blogger.com/atom/ns#' term='10gR2'/><category scheme='http://www.blogger.com/atom/ns#' term='AS OF'/><title type='text'>Flashback Query "AS OF" - Tablescan costs</title><content type='html'>This is just a short note prompted by a recent &lt;a href="http://forums.oracle.com/forums/thread.jspa?threadID=2232164"&gt;thread&lt;/a&gt; on the OTN forums. In recent versions Oracle changes the costs of a full table scan (FTS or index fast full scan / IFFS) quite dramatically if the "flashback query" clause gets used.&lt;br /&gt;&lt;br /&gt;It looks like that it simply uses the number of blocks of the segment as I/O cost for the FTS operation, quite similar to setting the "db_file_multiblock_read_count" ("dbfmbrc"), or from 10g on more precisely the "_db_file_optimizer_read_count", to 1 (but be aware of the MBRC setting of WORKLOAD System Statistics, see comments below) for the cost estimate of the segment in question.&lt;br /&gt;&lt;br /&gt;This can lead to some silly plans depending on the available other access paths as can be seen from the thread mentioned.&lt;br /&gt;&lt;br /&gt;Actually it seems to be quite "hard-coded" in the sense of that even with System Statistics aka. CPU Costing switched off ("traditional I/O based costing") the cost corresponds to the number of blocks which is different from the result when setting "dbfmbrc" to 1 and using traditional I/O based costing.&lt;br /&gt;&lt;br /&gt;This can be seen from the simple test case provided below.&lt;br /&gt;&lt;br /&gt;Prior versions seem to treat the case different - the current behaviour seems to have been introduced in 10.2.0.1, setting the optimizer features to 10.1.0.5 for example leaves the cost unchanged when using the "Flashback Query" clause.&lt;br /&gt;&lt;br /&gt;By the way: At runtime the multi-block I/O of the FTS operation seems to be using the normal settings, so it attempts to read multiple blocks at a time and not only a single one. Of course the consistent gets of a flashback query can potentially cause a lot of additional work, so an increased cost estimate is not unreasonable in principle. &lt;br /&gt;&lt;br /&gt;It also looks like that using different points in time / past SCNs do not change the cost estimate, so there seems not to be any dynamic "proration" depending on the point in time specified.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;set echo on linesize 200 feedback off trimspool on tab off&lt;br /&gt;&lt;br /&gt;drop table t;&lt;br /&gt;&lt;br /&gt;purge table t;&lt;br /&gt;&lt;br /&gt;-- Create a table with 10,000 blocks&lt;br /&gt;-- Use a MSSM tablespace to get exactly 10,000&lt;br /&gt;create table t&lt;br /&gt;pctfree 99&lt;br /&gt;pctused 1&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;      , rpad('x', 1000) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 10000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't', estimate_percent =&amp;gt; null)&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;        blocks&lt;br /&gt;from&lt;br /&gt;        user_tables&lt;br /&gt;where&lt;br /&gt;        table_name = 'T'&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;set pagesize 0&lt;br /&gt;&lt;br /&gt;-- Default costs&lt;br /&gt;explain plan for&lt;br /&gt;select * from t&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;&lt;br /&gt;-- Flashback Query&lt;br /&gt;explain plan for&lt;br /&gt;select * from t as of timestamp systimestamp&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;&lt;br /&gt;-- Flashback Query&lt;br /&gt;-- with disabled System Statistics / CPU Costing&lt;br /&gt;-- gives you exactly "blocks" + 1 (probably due to "_tablescan_cost_plus_one")&lt;br /&gt;explain plan for&lt;br /&gt;select /*+ no_cpu_costing */ * from t as of timestamp systimestamp&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;&lt;br /&gt;-- Flashback Query&lt;br /&gt;-- with 10.1.0.5 Optimizer features&lt;br /&gt;explain plan for&lt;br /&gt;select /*+ optimizer_features_enable('10.1.0.5') */ * from t as of timestamp systimestamp&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;&lt;br /&gt;-- The cost calculation with Flashback Query&lt;br /&gt;-- seems to correspond to a dbfmbrc set to 1 for the segment&lt;br /&gt;-- Note: This does not give the expected results if a MBRC has been defined&lt;br /&gt;-- in the WORKLOAD System Statistics because the MBRC overrides the&lt;br /&gt;-- "_db_file_optimizer_read_count" parameter if CPU Costing is enabled&lt;br /&gt;explain plan for&lt;br /&gt;select /*+ opt_param('_db_file_optimizer_read_count', 1) */ * from t&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;&lt;br /&gt;-- But not exactly:&lt;br /&gt;-- Traditional I/O based costing comes to a different result&lt;br /&gt;explain plan for&lt;br /&gt;select /*+ no_cpu_costing opt_param('_db_file_optimizer_read_count', 1) */ * from t&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This is what I get from 11.2.0.2:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;Connected to:&lt;br /&gt;Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production&lt;br /&gt;With the Partitioning, OLAP, Data Mining and Real Application Testing options&lt;br /&gt;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; drop table t;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; purge table t;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- Create a table with 10,000 blocks&lt;br /&gt;SQL&amp;gt; -- Use a MSSM tablespace to get exactly 10,000&lt;br /&gt;SQL&amp;gt; create table t&lt;br /&gt;  2  pctfree 99&lt;br /&gt;  3  pctused 1&lt;br /&gt;  4  as&lt;br /&gt;  5  select&lt;br /&gt;  6          rownum as id&lt;br /&gt;  7        , rpad('x', 1000) as filler&lt;br /&gt;  8  from&lt;br /&gt;  9          dual&lt;br /&gt; 10  connect by&lt;br /&gt; 11          level &amp;lt;= 10000&lt;br /&gt; 12  ;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; exec dbms_stats.gather_table_stats(null, 't', estimate_percent =&amp;gt; null)&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select&lt;br /&gt;  2          blocks&lt;br /&gt;  3  from&lt;br /&gt;  4          user_tables&lt;br /&gt;  5  where&lt;br /&gt;  6          table_name = 'T'&lt;br /&gt;  7  ;&lt;br /&gt;&lt;br /&gt;    BLOCKS&lt;br /&gt;----------&lt;br /&gt;     10000&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; set pagesize 0&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- Default costs&lt;br /&gt;SQL&amp;gt; explain plan for&lt;br /&gt;  2  select * from t&lt;br /&gt;  3  ;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;Plan hash value: 1601196873&lt;br /&gt;&lt;br /&gt;-----------------------------------------------&lt;br /&gt;| Id  | Operation         | Name | Cost (%CPU)|&lt;br /&gt;-----------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT  |      |  2715   (1)|&lt;br /&gt;|   1 |  TABLE ACCESS FULL| T    |  2715   (1)|&lt;br /&gt;-----------------------------------------------&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- Flashback Query&lt;br /&gt;SQL&amp;gt; explain plan for&lt;br /&gt;  2  select * from t as of timestamp systimestamp&lt;br /&gt;  3  ;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;Plan hash value: 1601196873&lt;br /&gt;&lt;br /&gt;-----------------------------------------------&lt;br /&gt;| Id  | Operation         | Name | Cost (%CPU)|&lt;br /&gt;-----------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT  |      | 10006   (1)|&lt;br /&gt;|   1 |  TABLE ACCESS FULL| T    | 10006   (1)|&lt;br /&gt;-----------------------------------------------&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- Flashback Query&lt;br /&gt;SQL&amp;gt; -- with disabled System Statistics / CPU Costing&lt;br /&gt;SQL&amp;gt; -- gives you exactly "blocks" + 1 (probably due to "_tablescan_cost_plus_one")&lt;br /&gt;SQL&amp;gt; explain plan for&lt;br /&gt;  2  select /*+ no_cpu_costing */ * from t as of timestamp systimestamp&lt;br /&gt;  3  ;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;Plan hash value: 1601196873&lt;br /&gt;&lt;br /&gt;------------------------------------------&lt;br /&gt;| Id  | Operation         | Name | Cost  |&lt;br /&gt;------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT  |      | 10001 |&lt;br /&gt;|   1 |  TABLE ACCESS FULL| T    | 10001 |&lt;br /&gt;------------------------------------------&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- Flashback Query&lt;br /&gt;SQL&amp;gt; -- with 10.1.0.5 Optimizer features&lt;br /&gt;SQL&amp;gt; explain plan for&lt;br /&gt;  2  select /*+ optimizer_features_enable('10.1.0.5') */ * from t as of timestamp systimestamp&lt;br /&gt;  3  ;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;Plan hash value: 1601196873&lt;br /&gt;&lt;br /&gt;-----------------------------------------------&lt;br /&gt;| Id  | Operation         | Name | Cost (%CPU)|&lt;br /&gt;-----------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT  |      |  2715   (1)|&lt;br /&gt;|   1 |  TABLE ACCESS FULL| T    |  2715   (1)|&lt;br /&gt;-----------------------------------------------&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- The cost calculation with Flashback Query&lt;br /&gt;SQL&amp;gt; -- seems to correspond to a dbfmbrc set to 1 for the segment&lt;br /&gt;SQL&amp;gt; -- Note: This does not give the expected results if a MBRC has been defined&lt;br /&gt;SQL&amp;gt; -- in the WORKLOAD System Statistics because the MBRC overrides the&lt;br /&gt;SQL&amp;gt; -- "_db_file_optimizer_read_count" parameter if CPU Costing is enabled&lt;br /&gt;SQL&amp;gt; explain plan for&lt;br /&gt;  2  select /*+ opt_param('_db_file_optimizer_read_count', 1) */ * from t&lt;br /&gt;  3  ;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;Plan hash value: 1601196873&lt;br /&gt;&lt;br /&gt;-----------------------------------------------&lt;br /&gt;| Id  | Operation         | Name | Cost (%CPU)|&lt;br /&gt;-----------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT  |      | 10006   (1)|&lt;br /&gt;|   1 |  TABLE ACCESS FULL| T    | 10006   (1)|&lt;br /&gt;-----------------------------------------------&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; -- But not exactly:&lt;br /&gt;SQL&amp;gt; -- Traditional I/O based costing comes to a different result&lt;br /&gt;SQL&amp;gt; explain plan for&lt;br /&gt;  2  select /*+ no_cpu_costing opt_param('_db_file_optimizer_read_count', 1) */ * from t&lt;br /&gt;  3  ;&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;SQL&amp;gt; select * from table(dbms_xplan.display(null, null, 'basic +cost'));&lt;br /&gt;Plan hash value: 1601196873&lt;br /&gt;&lt;br /&gt;------------------------------------------&lt;br /&gt;| Id  | Operation         | Name | Cost  |&lt;br /&gt;------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT  |      |  5966 |&lt;br /&gt;|   1 |  TABLE ACCESS FULL| T    |  5966 |&lt;br /&gt;------------------------------------------&lt;br /&gt;SQL&amp;gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-1841605359948618507?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/1841605359948618507/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=1841605359948618507' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/1841605359948618507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/1841605359948618507'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/06/flashback-query-as-of-tablescan-costs.html' title='Flashback Query &quot;AS OF&quot; - Tablescan costs'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-3344404122951443171</id><published>2011-06-06T22:38:00.011+02:00</published><updated>2011-06-07T12:53:05.487+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Optimizer'/><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.5'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.4'/><category scheme='http://www.blogger.com/atom/ns#' term='CBO'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='11.1.0.7'/><category scheme='http://www.blogger.com/atom/ns#' term='transitive closure'/><category scheme='http://www.blogger.com/atom/ns#' term='10gR2'/><title type='text'>Transitive Closure - Outer Joins</title><content type='html'>The Cost Based Optimizer (CBO) supports since at least Oracle 9i the automatic generation of additional predicates based on transitive closure.&lt;br /&gt;&lt;br /&gt;In principle this means:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;If a = b and b = c then the CBO can infer a = c&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As so often with these optimizations the purpose of these automatically generated additional predicates is to allow the optimizer finding potentially more efficient access paths, like an index usage or earlier filtering reducing the amount of data to process.&lt;br /&gt;&lt;br /&gt;So far I was aware of such additional predicates only when literals were involved, so if "a = 10 and a = b" then Oracle will automatically add "b = 10" (and in Oracle 9i remove actually the "a = b" predicate so you end up with "a = 10 and b = 10" but no longer "a = b", see &lt;a href="http://jonathanlewis.wordpress.com/"&gt;Jonathan Lewis&lt;/a&gt; on &lt;a href="http://jonathanlewis.wordpress.com/2007/01/01/transitive-closure/"&gt;transitive closure&lt;/a&gt; and &lt;a href="http://jonathanlewis.wordpress.com/2006/12/13/cartesian-merge-join/"&gt;cartesian merge join&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;According to MOS document "Transitivity and Transitive Closure [ID 68979.1]" Oracle 11g is supposed to support three kinds of transitive closure. In addition to the already mentioned join / literal predicate case these are the transitivity of join predicates without any literals involved ("a = b and b = c" then "a = c") and the literal predicate case applied to outer joins ("a = 10 and a = b(+)" then "b(+) = 10" which means that the filter will be applied logically to b before the join, not after).&lt;br /&gt;&lt;br /&gt;However I couldn't witness yet any working example of the pure join predicate transitive closure mentioned in the MOS article.&lt;br /&gt;&lt;br /&gt;On the OTN forums there was recently a &lt;a href="http://forums.oracle.com/forums/thread.jspa?threadID=2231685"&gt;question&lt;/a&gt; about partition pruning not taking place but the case there really turns out to be about transitive closure and outer joins (although the partitioning and parallel execution involved in the thread probably helps to confuse the issue).&lt;br /&gt;&lt;br /&gt;It looks like that Oracle does not apply transitive closure across outer joins in the sense of:&lt;br /&gt;&lt;br /&gt;a = 10 and a = b(+) and b = c(+)&lt;br /&gt;&lt;br /&gt;then Oracle will add automatically the "b(+) = 10" predicate as outlined above but it will not add "c(+) = 10" across the second outer join (it would add a "c = 10" predicate if this were inner joins).&lt;br /&gt;&lt;br /&gt;There might be a valid reason to prevent this from happening in general (the most appealing to me seems that "a = b (+) and b = c(+)" is not equal to "a = b(+) and a = c(+)") but at least in this particular case I don't see why the "c(+) = 10" predicate should not be added automatically.&lt;br /&gt;&lt;br /&gt;If anyone sees a valid explanation for this behaviour (viz: a general rule that gets violated when doing so) I'm open for suggestions.&lt;br /&gt;&lt;br /&gt;If the expression is changed into&lt;br /&gt;&lt;br /&gt;a = 10 and a = b(+) and a = c(+)&lt;br /&gt;&lt;br /&gt;then Oracle happily adds both the b(+) = 10 and c(+) = 10 predicates. Of course you will appreciate that the changed expression is semantically not the same as the previous one and the results might differ.&lt;br /&gt;&lt;br /&gt;The case on the OTN thread is actually a bit more interesting since it includes an outer join from one table to two other tables which Oracle in general does not support directly using its native Oracle joins but only with the help of LATERAL views - there is a good explanation &lt;a href="http://blogs.oracle.com/optimizer/entry/outerjoins_in_oracle"&gt;here&lt;/a&gt; by the Oracle Optimizer Group.&lt;br /&gt;&lt;br /&gt;The interesting part stripped to a bare minimum looks like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;set echo on&lt;br /&gt;&lt;br /&gt;drop table t;&lt;br /&gt;&lt;br /&gt;purge table t;&lt;br /&gt;&lt;br /&gt;create table t&lt;br /&gt;/*&lt;br /&gt;partition by list (pkey)&lt;br /&gt;(&lt;br /&gt;  partition p_1 values (1)&lt;br /&gt;, partition p_2 values (2)&lt;br /&gt;, partition p_3 values (3)&lt;br /&gt;)&lt;br /&gt;*/&lt;br /&gt;as&lt;br /&gt;select&lt;br /&gt;        rownum as id&lt;br /&gt;      , mod(rownum, 3) + 1 as pkey&lt;br /&gt;      , rownum as id2&lt;br /&gt;      , rpad('x', 100) as filler&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;connect by&lt;br /&gt;        level &amp;lt;= 30000&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;exec dbms_stats.gather_table_stats(null, 't')&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'trans_closure_outer_join';&lt;br /&gt;&lt;br /&gt;alter session set events '10053 trace name context forever, level 1';&lt;br /&gt;&lt;br /&gt;explain plan for&lt;br /&gt;select&lt;br /&gt;        *&lt;br /&gt;from&lt;br /&gt;        t t1&lt;br /&gt;left outer join&lt;br /&gt;        (select * from t where id &amp;gt;= 4000) t2&lt;br /&gt;on&lt;br /&gt;        t1.id = t2.id&lt;br /&gt;and     t1.pkey = t2.pkey&lt;br /&gt;left outer join&lt;br /&gt;        (select * from t where id &amp;gt;= 2000) t3&lt;br /&gt;on&lt;br /&gt;        t2.id2 = t3.id2&lt;br /&gt;and     t1.pkey = t3.pkey&lt;br /&gt;where&lt;br /&gt;        t1.pkey = 2&lt;br /&gt;order by&lt;br /&gt;        t1.id&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;alter session set events '10053 trace name context off';&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So we outer join T3 to T1 and T2. Although the PKEY predicate is actually coming from T1 and therefore on its own would definitely qualify for transitive closure even with the documented restriction in place (that would apply if t2.pkey = t3.pkey was used instead) the additional predicate doesn't get generated obviously due to the additional outer join between T2 and T3 (and we end up with a lateral view due to this when looking into the execution plan and generated CBO trace file).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-3344404122951443171?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/3344404122951443171/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=3344404122951443171' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/3344404122951443171'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/3344404122951443171'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/06/transitive-closure-outer-joins.html' title='Transitive Closure - Outer Joins'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-1296026667663133506</id><published>2011-05-29T17:00:00.002+02:00</published><updated>2011-05-30T11:16:35.354+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='truncate'/><category scheme='http://www.blogger.com/atom/ns#' term='Unique indexes'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Loading'/><category scheme='http://www.blogger.com/atom/ns#' term='unusable indexes'/><title type='text'>Things worth to mention and remember (IV) - Data Loading</title><content type='html'>In this part &lt;a href="http://oracle-randolf.blogspot.com/2011/04/things-worth-to-mention-and-remember.html"&gt;of the series&lt;/a&gt; I'll cover some basics about data loading:&lt;br /&gt;&lt;br /&gt;1. If you want to load a large amount of data quickly into a table or partition that gets truncated before the load and indexes also need to be maintained, then it is probably faster to set the the indexes to unusable before the load and rebuild them afterwards instead of letting the insert maintain the indexes. Note that even with a direct-path insert the index maintenance of usable indexes will generate undo and redo, whereas a separate index rebuild doesn't generate undo and can be also be run as a nologging operation if desired. However, as always, test for your particular situation/configuration as the index maintenance as part of direct-path inserts are quite efficient and therefore might not be that much slower than separate index rebuild steps.&lt;br /&gt;&lt;br /&gt;However, there is a simple, yet important point to consider when attempting to load with unusable indexes:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;A truncate makes indexes automatically usable&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So the order of DDL statement execution before the load is relevant:&lt;br /&gt;&lt;br /&gt;- ALTER INDEX ... UNUSABLE&lt;br /&gt;- TRUNCATE TABLE&lt;br /&gt;&lt;br /&gt;This way the data load &lt;span style="font-weight:bold;"&gt;will maintain the indexes&lt;/span&gt; since the TRUNCATE after the ALTER INDEX has set the indexes to USABLE again!&lt;br /&gt;&lt;br /&gt;The probably intended order is:&lt;br /&gt;&lt;br /&gt;- TRUNCATE TABLE&lt;br /&gt;- ALTER INDEX ... UNUSABLE&lt;br /&gt;&lt;br /&gt;The same applies to a data load into a partition of a table, only the syntax of the two commands is slightly different.&lt;br /&gt;&lt;br /&gt;2. Note that you cannot load into a segment with a UNIQUE index defined that is in UNUSABLE state, you'll get an error "ORA-26026: unique index ... initially in unusable state" (when using a direct-path insert) or "ORA-01502: index ... or partition of such index is in unusable state" (when using conventional insert) even with SKIP_UNUSABLE_INDEXES set to TRUE (default from 10g on).&lt;br /&gt;&lt;br /&gt;If you want to use an UNIQUE index that is not maintained during the data load you need to drop it and re-create after the load.&lt;br /&gt;&lt;br /&gt;There is a however a way to circumvent this: You can support a UNIQUE or PRIMARY KEY constraint by means of a non-unique index. This way you can either set the constraint deferrable, or disable the constraint before the load while keeping the index in unusable state and rebuilding the index and re-enabling the constraint afterwards.&lt;br /&gt;&lt;br /&gt;But you need to be aware of the following implications and side-effects of doing so. Note that depending on how you use the index and how your data manipulation patterns look like, they might not make any difference to your particular situation, but they can also have a very significant effect:&lt;br /&gt;&lt;br /&gt;A non-unique index behaves differently from a unique index in several ways. &lt;a href="http://richardfoote.wordpress.com/"&gt;Richard Foote&lt;/a&gt; has covered these differences very detailed already and since I probably couldn't say it any better I'm only going to mention here short recaps and corresponding links to his posts.&lt;br /&gt;&lt;br /&gt;- It requires slightly more space for the same amount of data (one additional length byte, see &lt;a href="http://richardfoote.wordpress.com/2007/12/18/differences-between-unique-and-non-unique-indexes-part-i/"&gt;Richard Foote&lt;/a&gt;)&lt;br /&gt;- Depending on the data manipulation patterns it might require potentially significant more space as part of DML operations because the index entries can not be re-used within a transaction even when the same data gets re-inserted, see &lt;a href="http://richardfoote.wordpress.com/2009/03/25/differences-between-unique-and-non-unique-indexes-part-iv-take-it-back/"&gt;Richard Foote&lt;/a&gt; and again &lt;a href="http://richardfoote.wordpress.com/2009/03/30/differences-between-unique-and-non-unique-indexes-part-45-fix-you/"&gt;here&lt;/a&gt; (because the ROWID is part of the key to make the index expression unique and because the ROWID can not be re-used within a transaction because the row entries in the row directory of a block can not be re-used within a transaction, see &lt;a href="http://jonathanlewis.wordpress.com/2009/05/21/row-directory/"&gt;Jonathan Lewis&lt;/a&gt;)&lt;br /&gt;- For non-unique indexes only: If you insert multiple rows with the same index key expression depending on the ROWID there might also be differences in the efficiency of a potential index block split (50-50 vs. 90-10 (actually 99-1))&lt;br /&gt;- At execution time an operation based on a INDEX UNIQUE SCAN is handled differently from an INDEX RANGE SCAN in terms of latching and optimizations of the consistent gets, so there might be measurable run-time differences (see &lt;a href="http://richardfoote.wordpress.com/2007/12/21/differences-between-unique-and-non-unique-indexes-part-ii/"&gt;Richard Foote&lt;/a&gt;). Note that this changes under certain circumstances from version 11g on, which I will cover in a separate blog series because it is a very interesting topic on its own.&lt;br /&gt;&lt;br /&gt;More on the topic by &lt;a href="http://richardfoote.wordpress.com/2007/12/30/differences-between-unique-and-non-unique-indexes-part-iii/"&gt;Richard Foote&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Also there are a few oddities with this setup to be aware of:&lt;br /&gt;&lt;br /&gt;- In 10.2 a unique/PK constraint supported by non-unique index disables direct-path inserts, see &lt;a href="http://oracle-randolf.blogspot.com/2008/07/deferrable-constraints-and-direct-path.html"&gt;here&lt;/a&gt; for more details&lt;br /&gt;&lt;br /&gt;- In 11.1 and 11.2 a unique/PK constraint supported by non-unique index allows direct-path inserts but unfortunately allows you to accidentally insert duplicates, see &lt;a href="http://oracle-randolf.blogspot.com/2008/11/primary-key-unique-constraints-enforced.html"&gt;here&lt;/a&gt; for more details&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-1296026667663133506?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/1296026667663133506/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=1296026667663133506' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/1296026667663133506'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/1296026667663133506'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/05/things-worth-to-mention-and-remember-iv.html' title='Things worth to mention and remember (IV) - Data Loading'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-617534663328233306</id><published>2011-05-22T22:38:00.004+02:00</published><updated>2011-05-23T09:37:55.024+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='troubleshooting'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.2'/><category scheme='http://www.blogger.com/atom/ns#' term='11.2.0.1'/><category scheme='http://www.blogger.com/atom/ns#' term='11.1.0.7'/><category scheme='http://www.blogger.com/atom/ns#' term='bug'/><category scheme='http://www.blogger.com/atom/ns#' term='ASSM'/><title type='text'>ASSM bug reprise - part 2</title><content type='html'>&lt;span style="font-weight:bold;"&gt;Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In the &lt;a href="http://oracle-randolf.blogspot.com/2011/05/assm-bug-reprise-part-1.html"&gt;first part&lt;/a&gt; of this post I've explained some of the details and underlying reasons of bug 6918210. The most important part of the bug is that it can only be hit if many row migrations happen during a single transaction. However, having excessive row migrations is usually a sign of poor design, so this point probably can't be stressed enough:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;span style="font-style:italic;"&gt;If you don't have excessive row migrations the bug can not become significant&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Of course, there might be cases where you think you actually have a sound design but due to lack of information about the internal workings it might not be obvious that excessive row migrations could be caused by certain activities.&lt;br /&gt;&lt;br /&gt;One popular feature that might cause such trouble is compression. The most important thing that you need to know about compression is this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;span style="font-style:italic;"&gt;Compression and subsequent significant updates do not work very well together&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The main reason for this is that Oracle stores any compressed data after an update as uncompressed (although it is still a "compressed" row/block format the data itself is stored uncompressed). Note that this also holds true for the "advanced" OLTP compression option, as we will see shortly.&lt;br /&gt;&lt;br /&gt;Given this fact it might become obvious that updates to compressed data can cause excessive row migrations, because:&lt;br /&gt;&lt;br /&gt;- The row data stored in uncompressed format will usually require a lot more space than the original compressed format, hence the row will no longer fit into the place of the block where it originally resided. It needs to be moved somewhere else within the same block (and the block might have to be "re-organized" in order to allow for sufficient contiguous space, which adds CPU overhead to the operation), and if there isn't sufficient space in the block available it will have to be migrated to a different block with sufficient space&lt;br /&gt;&lt;br /&gt;- By default the "basic" compression option of Oracle implicitly sets the PCTFREE of the block to 0, however you can change this by explicitly defining a PCTFREE. The "OLTP" compression leaves PCTFREE at the default of 10&lt;br /&gt;&lt;br /&gt;These PCTFREE settings, in particular the default of 0 used by "basic" compression, do not leave a lot of free space in the block for further row growth, so it becomes clear that without any non-default PCTFREE settings anything that performs updates to more than just a couple of rows per block of compressed data will lead to row migrations.&lt;br /&gt;&lt;br /&gt;Of course you'll appreciate at least with "basic" compression that does not attempt to re-compress the blocks any application updating more than a couple of compressed rows must be called bad design, since it uses the feature in a way that it was not intended for.&lt;br /&gt;&lt;br /&gt;Things look differently, from a general point of view at least, with "OLTP" compression since it promises to re-compress the data also during conventional DML and therefore should allow subsequent updates without suffering from too many or even excessive row migrations. Unfortunately, I couldn't confirm this in the tests that I've performed.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Basic Compression&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So, let's start with a simple variation of the original test case, by introducing "basic" compression and have a look at the results. &lt;br /&gt;&lt;br /&gt;By the way, all the tests have been performed using a 8K ASSM tablespace.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;set echo on timing on&lt;br /&gt;&lt;br /&gt;drop table t1;&lt;br /&gt;&lt;br /&gt;purge table t1;&lt;br /&gt;&lt;br /&gt;CREATE TABLE t1&lt;br /&gt;(pkey varchar2(1),&lt;br /&gt; v1 varchar2(255),&lt;br /&gt; v2 varchar2(255)&lt;br /&gt;)&lt;br /&gt; compress&lt;br /&gt; pctfree 0&lt;br /&gt; TABLESPACE &amp;&amp;tblspace;&lt;br /&gt;&lt;br /&gt;INSERT /*+ append */ INTO t1&lt;br /&gt;SELECT '1' as pkey,&lt;br /&gt;       to_char((mod(rownum, 1) + 1), 'TM') || 'BL' AS v1,&lt;br /&gt;       'BLUBB' /*null*/ AS v2&lt;br /&gt;    FROM dual CONNECT BY LEVEL &amp;lt;= 50000&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;BEGIN dbms_stats.gather_table_stats(&lt;br /&gt;        ownname =&amp;gt; null,&lt;br /&gt;        tabname =&amp;gt; 'T1');&lt;br /&gt;END;&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;create index t1_idx on t1 (substr(v1, 1, 1)) TABLESPACE &amp;tblspace;&lt;br /&gt;&lt;br /&gt;SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';&lt;br /&gt;&lt;br /&gt;truncate table chained_rows;&lt;br /&gt;&lt;br /&gt;analyze table t1 list chained rows;&lt;br /&gt;&lt;br /&gt;select count(*) from chained_rows;&lt;br /&gt;&lt;br /&gt;column file_no new_value file_no&lt;br /&gt;column block_no new_value block_no&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;        dbms_rowid.rowid_relative_fno(rowid) as file_no&lt;br /&gt;      , dbms_rowid.rowid_block_number(rowid) as block_no&lt;br /&gt;from&lt;br /&gt;        t1&lt;br /&gt;where&lt;br /&gt;        rownum &amp;lt;= 1;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'before_update';&lt;br /&gt;&lt;br /&gt;alter system checkpoint;&lt;br /&gt;&lt;br /&gt;alter system dump datafile &amp;file_no block &amp;block_no;&lt;br /&gt;&lt;br /&gt;pause Press Enter to continue...&lt;br /&gt;&lt;br /&gt;exec mystats_pkg.ms_start&lt;br /&gt;&lt;br /&gt;-- Counters not updated in 11g&lt;br /&gt;-- execute snap_kcbsw.start_snap&lt;br /&gt;&lt;br /&gt;/*&lt;br /&gt;alter session set tracefile_identifier = 'space_layer';&lt;br /&gt;&lt;br /&gt;alter session set events '10320 trace name context forever, level 3';&lt;br /&gt;alter session set events '10612 trace name context forever, level 1';&lt;br /&gt;*/&lt;br /&gt;&lt;br /&gt;UPDATE /*+ full(t1) */ t1 SET v2 = v1&lt;br /&gt;where substr(v1, 1, 1) = '1';&lt;br /&gt;&lt;br /&gt;/*&lt;br /&gt;alter session set events '10320 trace name context off';&lt;br /&gt;alter session set events '10612 trace name context off';&lt;br /&gt;*/&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;set serveroutput on size 1000000 format wrapped&lt;br /&gt;set linesize 120&lt;br /&gt;set trimspool on&lt;br /&gt;&lt;br /&gt;-- Counters not updated in 11g&lt;br /&gt;-- execute snap_kcbsw.end_snap&lt;br /&gt;&lt;br /&gt;exec mystats_pkg.ms_stop(1)&lt;br /&gt;&lt;br /&gt;BEGIN dbms_stats.gather_table_stats(&lt;br /&gt;        ownname =&amp;gt; null,&lt;br /&gt;        tabname =&amp;gt; 'T1');&lt;br /&gt;END;&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';&lt;br /&gt;&lt;br /&gt;truncate table chained_rows;&lt;br /&gt;&lt;br /&gt;analyze table t1 list chained rows;&lt;br /&gt;&lt;br /&gt;select count(*) from chained_rows;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'after_update';&lt;br /&gt;&lt;br /&gt;alter system checkpoint;&lt;br /&gt;&lt;br /&gt;alter system dump datafile &amp;file_no block &amp;block_no;&lt;br /&gt;&lt;br /&gt;accept rdba prompt 'Enter forwarding ROWID found in block dump: '&lt;br /&gt;&lt;br /&gt;column rdba new_value rdba&lt;br /&gt;&lt;br /&gt;-- Remove any potential leading and trailing unnecessary stuff&lt;br /&gt;select&lt;br /&gt;        substr('&amp;rdba',&lt;br /&gt;               case&lt;br /&gt;               when instr('&amp;rdba', '0x') = 0&lt;br /&gt;               then 1&lt;br /&gt;               else instr('&amp;rdba', '0x') + 2&lt;br /&gt;               end,&lt;br /&gt;               case&lt;br /&gt;               when instr('&amp;rdba', '.') = 0&lt;br /&gt;               then 32767&lt;br /&gt;               else instr('&amp;rdba', '.') -&lt;br /&gt;                 case&lt;br /&gt;                 when instr('&amp;rdba', '0x') = 0&lt;br /&gt;                 then 0&lt;br /&gt;                 else instr('&amp;rdba', '0x') + 2&lt;br /&gt;                 end&lt;br /&gt;               end&lt;br /&gt;              ) as rdba&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;        dbms_utility.data_block_address_file(to_number('&amp;rdba', rpad('X', length('&amp;rdba'), 'X'))) as file_no&lt;br /&gt;      , dbms_utility.data_block_address_block(to_number('&amp;rdba', rpad('X', length('&amp;rdba'), 'X'))) as block_no&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'migrated_rows';&lt;br /&gt;&lt;br /&gt;alter system dump datafile &amp;file_no block &amp;block_no;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;If you look closely at the example you'll notice that from an "innocent" application point of view the update will actually shorten the row in size - but of course only if you do not take into account the effects described above about compressed data.&lt;br /&gt;&lt;br /&gt;This is what a sample block dump looks like right after the insert:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;data_block_dump,data header at 0x551007c&lt;br /&gt;===============&lt;br /&gt;tsiz: 0x1f80&lt;br /&gt;hsiz: 0x5d6&lt;br /&gt;pbl: 0x0551007c&lt;br /&gt;     76543210&lt;br /&gt;flag=-0------&lt;br /&gt;ntab=2&lt;br /&gt;nrow=728&lt;br /&gt;frre=-1&lt;br /&gt;fsbo=0x5d6&lt;br /&gt;fseo=0x113e&lt;br /&gt;avsp=0xc&lt;br /&gt;tosp=0xc&lt;br /&gt; r0_9ir2=0x0&lt;br /&gt; mec_kdbh9ir2=0x1&lt;br /&gt;               76543210&lt;br /&gt; shcf_kdbh9ir2=----------&lt;br /&gt;           76543210&lt;br /&gt; flag_9ir2=--R----C&lt;br /&gt;  fcls_9ir2[4]={ 0 32768 32768 32768 }&lt;br /&gt;0x1e:pti[0] nrow=1 offs=0&lt;br /&gt;0x22:pti[1] nrow=727 offs=1&lt;br /&gt;0x26:pri[0] offs=0x1f71&lt;br /&gt;0x28:pri[1] offs=0x1f6c&lt;br /&gt;0x2a:pri[2] offs=0x1f67&lt;br /&gt;0x2c:pri[3] offs=0x1f62&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;block_row_dump:&lt;br /&gt;tab 0, row 0, @0x1f71&lt;br /&gt;tl: 15 fb: --H-FL-- lb: 0x0  cc: 3&lt;br /&gt;col  0: [ 1]  31&lt;br /&gt;col  1: [ 3]  31 42 4c&lt;br /&gt;col  2: [ 5]  42 4c 55 42 42&lt;br /&gt;bindmp: 02 d7 03 c9 31 cb 31 42 4c cd 42 4c 55 42 42&lt;br /&gt;tab 1, row 0, @0x1f6c&lt;br /&gt;tl: 5 fb: --H-FL-- lb: 0x0  cc: 3&lt;br /&gt;col  0: [ 1]  31&lt;br /&gt;col  1: [ 3]  31 42 4c&lt;br /&gt;col  2: [ 5]  42 4c 55 42 42&lt;br /&gt;bindmp: 2c 00 01 03 00&lt;br /&gt;tab 1, row 1, @0x1f67&lt;br /&gt;tl: 5 fb: --H-FL-- lb: 0x0  cc: 3&lt;br /&gt;col  0: [ 1]  31&lt;br /&gt;col  1: [ 3]  31 42 4c&lt;br /&gt;col  2: [ 5]  42 4c 55 42 42&lt;br /&gt;bindmp: 2c 00 01 03 00&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;You'll notice that the rows have been compressed very efficiently (see the "symbol table" as "tab 0") and are just reported to allocate 5 bytes each.&lt;br /&gt;&lt;br /&gt;It is quite interesting to note that the rows are actually stored with no gap in between, without compression usually Oracle will allocate at least 9 bytes per row (the minimum row size required for migrated rows) even if the actual row is less than 9 bytes in length. I'm not sure why Oracle does this since it means even more work in case of row migrations. Since Oracle doesn't store more rows in the block than dictated by the minimum row size of 9 bytes even with compression enabled this effect is a bit puzzling. You can tell that from the block header - although there is still physically free space in the block (free space begin offset: fsbo=0x5d6, free space end offset: fseo=0x113e but available space: avsp=0xc) it is reported with almost no space available. &lt;br /&gt;&lt;br /&gt;In contrast this is what a block looks like where uncompressed very small rows are stored:&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;data_block_dump,data header at 0x610a07c&lt;br /&gt;===============&lt;br /&gt;tsiz: 0x1f80&lt;br /&gt;hsiz: 0x5c6&lt;br /&gt;pbl: 0x0610a07c&lt;br /&gt;     76543210&lt;br /&gt;flag=--------&lt;br /&gt;ntab=1&lt;br /&gt;nrow=730&lt;br /&gt;frre=-1&lt;br /&gt;fsbo=0x5c6&lt;br /&gt;fseo=0x5d6&lt;br /&gt;avsp=0x10&lt;br /&gt;tosp=0x10&lt;br /&gt;0xe:pti[0] nrow=730 offs=0&lt;br /&gt;0x12:pri[0] offs=0x1f7b&lt;br /&gt;0x14:pri[1] offs=0x1f72&lt;br /&gt;0x16:pri[2] offs=0x1f69&lt;br /&gt;0x18:pri[3] offs=0x1f60&lt;br /&gt;0x1a:pri[4] offs=0x1f57&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;block_row_dump:&lt;br /&gt;tab 0, row 0, @0x1f7b&lt;br /&gt;tl: 5 fb: --H-FL-- lb: 0x0  cc: 1&lt;br /&gt;col  0: [ 1]  31&lt;br /&gt;tab 0, row 1, @0x1f72&lt;br /&gt;tl: 5 fb: --H-FL-- lb: 0x0  cc: 1&lt;br /&gt;col  0: [ 1]  31&lt;br /&gt;tab 0, row 2, @0x1f69&lt;br /&gt;tl: 5 fb: --H-FL-- lb: 0x0  cc: 1&lt;br /&gt;col  0: [ 1]  31&lt;br /&gt;tab 0, row 3, @0x1f60&lt;br /&gt;tl: 5 fb: --H-FL-- lb: 0x0  cc: 1&lt;br /&gt;col  0: [ 1]  31&lt;br /&gt;tab 0, row 4, @0x1f57&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Although the rows effectively require less than 9 bytes the gap between each row is exactly 9 bytes and the available space corresponds to the free space offsets (fsbo=0x5c6, fseo=0x5d6 and avsp=0x10).&lt;br /&gt;&lt;br /&gt;Back to our "basic" compression example: Of course we know that the update decompresses the affected rows which means that they will allocate more than these 5 bytes and hence eventually will cause row migrations. Since the decompressed row size is still very small and below the limits outlined in part 1 we'll run again into the basic bug - the blocks holding the migrated rows can take more rows than ITL entries are available leading to the "free space search" anomaly.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;OLTP Compression&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's change the script and use OLTP compression instead. Additionally we'll give Oracle a little room to manoeuvre so that it should be able to store an uncompressed row and re-compress it when required. I do this by creating and populating the table with PCTFREE 10 but switching to PCTFREE 0 right before the update. Since we do not expect that many row migrations with OLTP compression the bug shouldn't be a problem.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;CREATE TABLE t1&lt;br /&gt;(pkey varchar2(1),&lt;br /&gt; v1 varchar2(255),&lt;br /&gt; v2 varchar2(255)&lt;br /&gt;)&lt;br /&gt; compress for all operations&lt;br /&gt; pctfree 10&lt;br /&gt; TABLESPACE &amp;&amp;tblspace;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;alter table t1 pctfree 0;&lt;br /&gt;&lt;br /&gt;UPDATE /*+ full(t1) */ t1 SET v2 = v1&lt;br /&gt;where substr(v1, 1, 1) = '1';&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So the expectation according to the general description of how OLTP compression works would be that whenever Oracle hits the PCTFREE limit it attempts to re-compress the block. Actually a more accurate description is that the compression is supposed to take place whenever Oracle re-organizes the block to coalesce free space in the block to maximize the contiguous chunks of available space. Oracle 10g added a statistic for this called "heap block compress", see e.g. Jonathan Lewis' &lt;a href="http://jonathanlewis.wordpress.com/2010/03/30/heap-block-compress/"&gt;blog post&lt;/a&gt; about the heap block compression.&lt;br /&gt;&lt;br /&gt;But if you run this script in 11.1.0.7 base release, the results will not be as expected: The table has the same size afterwards as with "basic" compression, and we apparently still hit the bug - the update still takes millions of session logical reads. In fact, looking at the block dumps and statistics it looks like that no HSC compression has taken place at all, although we see a lot of occurrences of "heap block compress". You can confirm this by looking for statistics beginning with "HSC..." after the update - apart from the "HSC Heap/Compressed Segment Block Changes" no other HSC related statistics show up.&lt;br /&gt;&lt;br /&gt;If you follow one of the migrated rows, you'll see that even the rows that have been migrated to a new block are not compressed (and not even stored in compressed block format) - so we hit again the same basic "no ITLs left but still free space" problem in the blocks holding the migrated rows.&lt;br /&gt;&lt;br /&gt;If you run the same script in 11.2.0.1 or 11.2.0.2 the results look slightly different: The original blocks are still not re-compressed, so we seem to end up with the same number of migrated rows, but at least the blocks holding the migrated rows get compressed and therefore the resulting segment will be smaller in size. It also interesting to note that the number of ITL entries in compressed blocks seems to be limited to 20 - probably controlled by the parameter "_limit_itl" as Jonathan Lewis recently has found out.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;OLTP Compression Block Re-Compression&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I have to admit that I really had a hard time convincing Oracle in any of the versions tested to perform a re-compression of a block that wasn't manipulated by inserts (or inserts caused by row migrations). In fact this is the only circumstance where I could see the HSC compression effectively work. Most importantly it doesn't seem to work in my cases when updating rows that grow in size - leading exactly to our problem: Excessive row migrations when updating many rows.&lt;br /&gt;&lt;br /&gt;I've tried different variations of DML patterns (small / large transactions, mixtures of insert, deletes, updates etc. among those the most straightforward one - updating a column to the same value which should make re-compression a "no-brainer"), and to me it looks like that OLTP re-compression attempts are only triggered by inserts into a block, updates never seem to trigger a re-compression.&lt;br /&gt;&lt;br /&gt;Oracle performs a row migration rather than attempting to re-compress the block. It might be a deliberate design decision since the OLTP compression, apart from the obvious CPU overhead, also seems to be write a full pre-compression block image to undo/redo, since the changes done to the block by the compression apparently can not be represented by discrete change vectors. Therefore minimizing the compression attempts by simply migrating rows when necessary could be a deliberate choice - the undo/redo overhead of the OLTP compression can be significant. &lt;br /&gt;&lt;br /&gt;Also, since it is called "OLTP" compression, one could assume that a OLTP workload consists of a general mixture of inserts, updates and deletes and therefore not compressing on updates shouldn't be too relevant. This seems also to be confirmed when reading &lt;a href="http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/logical.htm#CNCPT1054"&gt;this section&lt;/a&gt; of the "Concepts" guide which could be interpreted that only inserts trigger a re-compression of a block.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;ASSM Bug variation #1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Of course, things could be worse, and that's what exactly is going to happen with OLTP compression in 11.1.0.7 base release: Increasing the row length or PCTFREE should prevent the basic bug from happening even with excessive row migrations. So let's repeat the test case, this time using the trick we used in part 1 to prevent the bug: Setting PCTFREE at 50 right before the update - this will mark the blocks with the migrated rows as full long before the ITL slots can become depleted.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;alter table t1 pctfree 50;&lt;br /&gt;&lt;br /&gt;UPDATE /*+ full(t1) */ t1 SET v2 = v1&lt;br /&gt;where substr(v1, 1, 1) = '1';&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;But look at the results: &lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;50000 rows updated.&lt;br /&gt;&lt;br /&gt;Elapsed: 00:00:24.16.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;STAT..session logical reads                                          9,641,591&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Even worse than before - even longer runtime, even more "session logical reads" and more CPU consumption.&lt;br /&gt;&lt;br /&gt;Looking at the blocks holding the migrated rows we can clearly tell that they don't meet the prerequisites for the basic "ITL is full but free space left" problem - so something else must be going on leading to similar symptoms.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt; Itl           Xid                  Uba         Flag  Lck        Scn/Fsc&lt;br /&gt;0x01   0x0002.01b.00000767  0x00c015bc.0298.21  --U-   89  fsc 0x0000.002641bf&lt;br /&gt;0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;0x5a   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000&lt;br /&gt;0x5b   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000&lt;br /&gt;bdba: 0x024000ea&lt;br /&gt;data_block_dump,data header at 0x2ea08bc&lt;br /&gt;===============&lt;br /&gt;tsiz: 0x1740&lt;br /&gt;hsiz: 0xc4&lt;br /&gt;pbl: 0x02ea08bc&lt;br /&gt;     76543210&lt;br /&gt;flag=--------&lt;br /&gt;ntab=1&lt;br /&gt;nrow=89&lt;br /&gt;frre=-1&lt;br /&gt;fsbo=0xc4&lt;br /&gt;fseo=0x10a5&lt;br /&gt;avsp=0xfe1&lt;br /&gt;tosp=0xfe1&lt;br /&gt;0xe:pti[0] nrow=89 offs=0&lt;br /&gt;0x12:pri[0] offs=0x172d&lt;br /&gt;0x14:pri[1] offs=0x171a&lt;br /&gt;0x16:pri[2] offs=0x1707&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;When enabling the debug events 10320 and 10612 you'll notice that there is a similar pattern of "...didnot like..." messages as with the basic bug, but in addition there are other entries that don't show up in the basic case: "Compressable Block...". To me it looks like Oracle actually considers these blocks as candidates for further inserts if they were compressed, but somehow this compression never happens, so the blocks are left in that state and are considered and rejected for inserts over and over again leading to the same symptoms as for the basic bug, but obviously for different reasons. &lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;KDTGSP: seg:0x2400083 mark_full:0 pdba:0x02400085&lt;br /&gt;kdtgsp: calling ktspgsp_cbk w/ kdt_bseg_srch_cbk() on 73997&lt;br /&gt;Enter cbk---------------&lt;br /&gt;[ktspisc]: Rejected: BlockDBA:0x02400085&lt;br /&gt;ktspfsrch: Returns: BlockDBA:0x024000e8&lt;br /&gt;kdt_bseg_srch_cbk: examine dba=10.0x024000e8&lt;br /&gt;kdt_bseg_srch_cbk:Compressable Block dba=10.0x024000e8 avs=832 afs=0 tosp=832 full=0&lt;br /&gt;kdt_bseg_srch_cbk: failed dba=10.0x024000e8 avs=832 afs=0 tosp=832 full=0&lt;br /&gt;ktspfsrch:Cbk didnot like 0x024000e8&lt;br /&gt;ktspfsrch: Returns: BlockDBA:0x024000eb&lt;br /&gt;kdt_bseg_srch_cbk: examine dba=10.0x024000eb&lt;br /&gt;kdt_bseg_srch_cbk:Compressable Block dba=10.0x024000eb avs=832 afs=0 tosp=832 full=0&lt;br /&gt;kdt_bseg_srch_cbk: failed dba=10.0x024000eb avs=832 afs=0 tosp=832 full=0&lt;br /&gt;ktspfsrch:Cbk didnot like 0x024000eb&lt;br /&gt;ktspfsrch: Returns: BlockDBA:0x024000ee&lt;br /&gt;kdt_bseg_srch_cbk: examine dba=10.0x024000ee&lt;br /&gt;kdt_bseg_srch_cbk:Compressable Block dba=10.0x024000ee avs=832 afs=0 tosp=832 full=0&lt;br /&gt;kdt_bseg_srch_cbk: failed dba=10.0x024000ee avs=832 afs=0 tosp=832 full=0&lt;br /&gt;ktspfsrch:Cbk didnot like 0x024000ee&lt;br /&gt;ktspfsrch: Returns: BlockDBA:0x024000f1&lt;br /&gt;kdt_bseg_srch_cbk: examine dba=10.0x024000f1&lt;br /&gt;kdt_bseg_srch_cbk: found dba=10.0x024000f1 avs=3511 afs=0 tosp=3511 full=1&lt;br /&gt;Exit cbk ------&lt;br /&gt; ndba:0x024000f1&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This assumption is actually confirmed in various bug descriptions on My Oracle Support. In particular bugs 9341448, 6009614, 9667930, 9708484 and 9667930 seem to apply. Note that there seems to be another related bug 8287680 for inserts. Also document 1101900.1 applies. According to these bug descriptions this variation of the bug is actually fixed in some patch sets (as opposed to the "basic" bug in part 1 that is only fixed in the 11.2 release) - and also a one-off patch (9667930) seems to be available.&lt;br /&gt;&lt;br /&gt;It is also interesting to note that this update anomaly is also documented in the "Master Note for OLTP Compression [ID 1223705.1]" on My Oracle Support - search for "Test #5" and review the runtimes mentioned in the table for that test, Scenario #3 / #5.&lt;br /&gt;&lt;br /&gt;Note that this problem does not show up if the test case gets repeated with basic compression instead - it will not attempt to re-compress the blocks and therefore can only be affected by the basic variation of the bug.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;ASSM Bug variation #2&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;But, it is not over yet - finally we come to the oddest variation of the bug in 11.1.0.7 base release: Combining basic compression with partitioning can lead to the same symptoms as with the just outlined "re-compression" bug with OLTP compression. Oracle apparently sometimes (it is not always reproducible) starts to consider blocks for re-compression with the same dire results although basic compression gets used.&lt;br /&gt;&lt;br /&gt;Again a slight variation of the script gets used: I'll use basic compression with PCTFREE 10 but introduce a very simple partitioning schema. Also the V2 column gets updated to the same value - all this is done to prevent running into the basic bug because the migrated rows will be too large for that.&lt;br /&gt;&lt;br /&gt;&lt;div class="codesnippet"&gt;&lt;br /&gt;set echo on timing on&lt;br /&gt;&lt;br /&gt;drop table t1;&lt;br /&gt;&lt;br /&gt;purge table t1;&lt;br /&gt;&lt;br /&gt;CREATE TABLE t1&lt;br /&gt;(pkey varchar2(1),&lt;br /&gt; v1 varchar2(255),&lt;br /&gt; v2 varchar2(255)&lt;br /&gt;)&lt;br /&gt;partition by range(pkey)&lt;br /&gt;(&lt;br /&gt; partition p1 values less than (2),&lt;br /&gt; partition p2 values less than (3)&lt;br /&gt;)&lt;br /&gt;compress&lt;br /&gt;pctfree 10&lt;br /&gt;TABLESPACE &amp;&amp;tblspace;&lt;br /&gt;&lt;br /&gt;INSERT /*+ append */ INTO t1&lt;br /&gt;SELECT '1' as pkey,&lt;br /&gt;       to_char((mod(rownum, 1) + 1), 'TM') || 'BL' AS v1,&lt;br /&gt;       'BLUBB' /*null*/ AS v2&lt;br /&gt;    FROM dual CONNECT BY LEVEL &amp;lt;= 50000&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;BEGIN dbms_stats.gather_table_stats(&lt;br /&gt;        ownname =&amp;gt; null,&lt;br /&gt;        tabname =&amp;gt; 'T1');&lt;br /&gt;END;&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;create index t1_idx on t1 (substr(v1, 1, 1)) TABLESPACE &amp;tblspace;&lt;br /&gt;&lt;br /&gt;SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';&lt;br /&gt;&lt;br /&gt;truncate table chained_rows;&lt;br /&gt;&lt;br /&gt;analyze table t1 list chained rows;&lt;br /&gt;&lt;br /&gt;select count(*) from chained_rows;&lt;br /&gt;&lt;br /&gt;column file_no new_value file_no&lt;br /&gt;column block_no new_value block_no&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;        dbms_rowid.rowid_relative_fno(rowid) as file_no&lt;br /&gt;      , dbms_rowid.rowid_block_number(rowid) as block_no&lt;br /&gt;from&lt;br /&gt;        t1&lt;br /&gt;where&lt;br /&gt;        rownum &amp;lt;= 1;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'before_update';&lt;br /&gt;&lt;br /&gt;alter system checkpoint;&lt;br /&gt;&lt;br /&gt;alter system dump datafile &amp;file_no block &amp;block_no;&lt;br /&gt;&lt;br /&gt;pause Press Enter to continue...&lt;br /&gt;&lt;br /&gt;exec mystats_pkg.ms_start&lt;br /&gt;&lt;br /&gt;-- Counters not updated in 11g&lt;br /&gt;-- execute snap_kcbsw.start_snap&lt;br /&gt;&lt;br /&gt;/* This will generate a huge tracefile&lt;br /&gt;alter session set tracefile_identifier = 'space_layer';&lt;br /&gt;&lt;br /&gt;alter session set events '10320 trace name context forever, level 3';&lt;br /&gt;alter session set events '10612 trace name context forever, level 1';&lt;br /&gt;*/&lt;br /&gt;&lt;br /&gt;-- Uncomment this to prevent the bug&lt;br /&gt;-- alter system flush shared_pool;&lt;br /&gt;&lt;br /&gt;UPDATE /*+ full(t1) */ t1 SET v2 = v2&lt;br /&gt;where substr(v1, 1, 1) = '1';&lt;br /&gt;&lt;br /&gt;/*&lt;br /&gt;alter session set events '10320 trace name context off';&lt;br /&gt;alter session set events '10612 trace name context off';&lt;br /&gt;*/&lt;br /&gt;&lt;br /&gt;commit;&lt;br /&gt;&lt;br /&gt;set serveroutput on size 1000000 format wrapped&lt;br /&gt;set linesize 120&lt;br /&gt;set trimspool on&lt;br /&gt;&lt;br /&gt;-- Counters not updated in 11g&lt;br /&gt;-- execute snap_kcbsw.end_snap&lt;br /&gt;&lt;br /&gt;exec mystats_pkg.ms_stop(1)&lt;br /&gt;&lt;br /&gt;BEGIN dbms_stats.gather_table_stats(&lt;br /&gt;        ownname =&amp;gt; null,&lt;br /&gt;        tabname =&amp;gt; 'T1');&lt;br /&gt;END;&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';&lt;br /&gt;&lt;br /&gt;truncate table chained_rows;&lt;br /&gt;&lt;br /&gt;analyze table t1 list chained rows;&lt;br /&gt;&lt;br /&gt;select count(*) from chained_rows;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'after_update';&lt;br /&gt;&lt;br /&gt;alter system checkpoint;&lt;br /&gt;&lt;br /&gt;alter system dump datafile &amp;file_no block &amp;block_no;&lt;br /&gt;&lt;br /&gt;accept rdba prompt 'Enter forwarding ROWID found in block dump: '&lt;br /&gt;&lt;br /&gt;column rdba new_value rdba&lt;br /&gt;&lt;br /&gt;-- Remove any potential leading and trailing unnecessary stuff&lt;br /&gt;select&lt;br /&gt;        substr('&amp;rdba',&lt;br /&gt;               case&lt;br /&gt;               when instr('&amp;rdba', '0x') = 0&lt;br /&gt;               then 1&lt;br /&gt;               else instr('&amp;rdba', '0x') + 2&lt;br /&gt;               end,&lt;br /&gt;               case&lt;br /&gt;               when instr('&amp;rdba', '.') = 0&lt;br /&gt;               then 32767&lt;br /&gt;               else instr('&amp;rdba', '.') -&lt;br /&gt;                 case&lt;br /&gt;                 when instr('&amp;rdba', '0x') = 0&lt;br /&gt;                 then 0&lt;br /&gt;                 else instr('&amp;rdba', '0x') + 2&lt;br /&gt;                 end&lt;br /&gt;               end&lt;br /&gt;              ) as rdba&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;select&lt;br /&gt;        dbms_utility.data_block_address_file(to_number('&amp;rdba', rpad('X', length('&amp;rdba'), 'X'))) as file_no&lt;br /&gt;      , dbms_utility.data_block_address_block(to_number('&amp;rdba', rpad('X', length('&amp;rdba'), 'X'))) as block_no&lt;br /&gt;from&lt;br /&gt;        dual&lt;br /&gt;;&lt;br /&gt;&lt;br /&gt;alter session set tracefile_identifier = 'migrated_rows';&lt;br /&gt;&lt;br /&gt;alter system dump datafile &amp;file_no block &amp;block_no;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Now if you run this script under 11.1.0.7 (base release) for several times, you might be able to spot similar symptoms: The updates takes suddenly very long with the same diagnosis output - in particular the 10320 and 10612 debug events show the same "Compressable Block" lines. Although we don't use advanced compression!&lt;br /&gt;&lt;br /&gt;Interestingly the problem in this case can be prevented by invalidating information that is cached in the Shared Pool: ALTERing the table right before the update or brute force flushing the Shared Pool will prevent the problem from showing up. By clearing this information the confusion about re-compressing the blocks seems not to pop up. I have to admit however that I don't have a sound explanation why this invalidation of Shared Pool contents prevents this variation of the bug.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;- Mixing compression with a significant number of updates is a bad idea in general&lt;br /&gt;&lt;br /&gt;- OLTP block re-compression seems only to be triggered with inserts - updates will lead to row migrations rather than block re-compression&lt;br /&gt;&lt;br /&gt;- Due to that behaviour you can end up with more row migrations than assumed even with OLTP compression&lt;br /&gt;&lt;br /&gt;- In 11.1.0.7 (base release) something is going wrong with the re-compression attempts of OLTP compression leading to dire results when row migrations happen and an ASSM tablespace gets used. This is a documented bug, a good starting point is document 1101900.1.&lt;br /&gt;&lt;br /&gt;- Using basic compression with partitioning can sometimes lead to the same symptoms as described in those bugs for OLTP compression&lt;br /&gt;&lt;br /&gt;You should be very careful when using OLTP compression in release 11.1 and monitor the row migration rate - the database might have to work much harder than expected. You should certainly consider applying the one-off patch 9667930 or a patch set that contains the mentioned bug fixes if you plan to use OLTP compression in 11.1 (and ASSM).&lt;br /&gt;&lt;br /&gt;Note that I could not reproduce any of these oddities in 11.2 - apart from the fact that updates never seem to trigger a re-compression. However, some of the bug descriptions found seem to suggest that 11.2 could be affected by some of the mentioned bugs as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5124641802818980374-617534663328233306?l=oracle-randolf.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oracle-randolf.blogspot.com/feeds/617534663328233306/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5124641802818980374&amp;postID=617534663328233306' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/617534663328233306'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5124641802818980374/posts/default/617534663328233306'/><link rel='alternate' type='text/html' href='http://oracle-randolf.blogspot.com/2011/05/assm-bug-reprise-part-2.html' title='ASSM bug reprise - part 2'/><author><name>Randolf</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5124641802818980374.post-8839164758946599819</id><published>2011-05-16T22:31:00.005+02:00</published><updated>2011-05-16T23:31:26.175+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.5'/><category scheme='http://www.blogger.com/atom/ns#' term='troubleshooting'/><category scheme='http://www.blogger.com/atom/ns#' term='11g'/><category scheme='http://www.blogger.com/atom/ns#' term='10.2.0.4'/><category scheme='http://www.blogger.com/atom/ns#' term='11.1.0.7'/><category scheme='http://www.blogger.com/atom/ns#' t
