Wednesday, March 28, 2012

(Virtual) Public Appearances - Free Webinars

I'll be presenting at two upcoming webinars:

1. "Oracle Cost-Based Optimizer Basics" hosted by AllThingsOracle.com on Wednesday, 11th April 2012 16:00 (UK Time).

In this session I'll explain the key concepts that influence the decisions of the Oracle Cost-Based Optimizer most. If you want to understand the key concepts about how to write efficient queries and why the optimizer sometimes might not come up with a reasonable execution plan, then this session is for you.

Note that the webinar is free and registrants will receive a recording afterwards. Here is the link to the official AllThingsOracle.com landing page. Thanks to AllThingsOracle.com for hosting this session.

Here is an abstract of the session:

Cost-Based Optimizer Basics

When it comes to writing efficient queries there are a few key concepts that need to be understood. Although it is called a cost-based optimizer it is actually not the cost we need to focus on primarily to understand why the optimizer takes certain decisions. In this webinar these key concepts will be highlighted and explained along with some live demos. We will see why it is crucial that the optimizer's picture of the data fits reality, why cardinality and selectivity estimates matter so much and which key concepts the optimizer's model surprisingly doesn't cover (yet). This also means that you as a human being sometimes know much more about the data and may need to lead the optimizer into the right direction.

2. I'll be giving essentially the same session for the DOAG database SIG on Friday, the 11th May 2012 11:00 (Germany).

This webinar is only available to DOAG members I believe, but if you are a member then it is free. This one I'll be giving in German.

The number of (virtual) seats seems to be limited, so you might want to get in touch with the contact details mentioned below:

Anmeldung:
Julia Kasprzak,
DOAG Geschäftsstelle
office@doag.org

Here is the link to the DOAG website and the session details in German:

Cost-Based Optimizer Grundlagen

Kurzbeschreibung:
Wenn es darum geht, effizientes SQL zu schreiben, gibt es einige wenige Schlüsselkonzepte, die zu verstehen sind. Obwohl es "kostenbasierter Optimizer" genannt wird, sind es eben nicht primär die Kosten, auf die wir schauen müssen, um zu verstehen, warum der Optimizer bestimmte Entscheidungen getroffen hat. In diesem Webinar werden diese Schlüsselkonzepte beleuchtet und anhand von einigen Live-Demonstrationen erklärt. Wir werden sehen, warum die Kardinalitäts- und Selektivitätsabschätzungen des Optimizers so elementar wichtig sind und welche Schlüsselkonzepte der Optimizer überraschenderweise derzeit (noch) nicht berücksichtigt. Dies bedeutet auch, dass wir manchmal wesentlich mehr über unsere Daten und Applikation wissen als der Optimizer und daher ihn in die richtige Richtung lenken müssen.

Monday, March 26, 2012

Coalesce Subquery Transformation - COALESCE_SQ

Oracle 11.2 introduced a set of new Query Transformations, among others the ability to coalesce subqueries which means that multiple correlated subqueries can be merged into a number of less subqueries.

Timur Akhmadeev already demonstrated the basic principles in a blog entry, but when I was recently involved into supporting a TPC-H benchmark for a particular storage vendor I saw a quite impressive application of this optimization that I would like to share here.

In principle the TPC-H benchmark is simple and attempts to simulate typical DWH query workloads (or what was assumed to be a typical DWH workload when it was designed many years ago) with only a rather limited amount of DML in the mix. The query part consists of 22 queries that have to be run one after the other in order to measure the so called "Power" component of the benchmark. Similar 22 queries will then be run concurrently by at least 5 "streams", where each stream runs them in a different (pseudo-randomized) order to measure the so called "Throughput" part of the benchmark. There is also a DML part but it is almost negligible compared to the query load generated.

One of the most demanding queries out of the 22 is called "Suppliers Who Kept Orders Waiting Query (Q21)" and looks like this in Oracle SQL:


select
*
from (
select
s_name
, count(*) as numwait
from
supplier
, lineitem l1
, orders
, nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
null
from
lineitem l2
where
l1.l_orderkey = l2.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
null
from
lineitem l3
where
l1.l_orderkey = l3.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
group by
s_name
order by
numwait desc
, s_name
)
where
rownum <= 100
;


The demanding part of the query is that it accesses the by far largest table LINEITEM three times: Once in the main query and twice as part of the correlated subqueries (EXISTS (...L2...) / NOT EXISTS (...L3...)).

A minimal setup to reproduce the execution plans can be found at the end of this post.

I've deliberately kept the complexity of the setup at the bare minimum - usually the actual table definitions include parallelism, partitioning, compression and other options like freelist and freelist groups for MSSM setups.

Let's have a look at the execution plan produced by pre-11.2 optimizer versions:


Plan hash value: 1997471497

-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc|
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 4000 | |
|* 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 375K| 14M| |
|* 3 | SORT ORDER BY STOPKEY | | 375K| 79M| 86M|
| 4 | HASH GROUP BY | | 375K| 79M| 86M|
|* 5 | HASH JOIN ANTI | | 375K| 79M| 68M|
|* 6 | HASH JOIN SEMI | | 375K| 64M| 59M|
|* 7 | HASH JOIN | | 375K| 54M| 53M|
|* 8 | HASH JOIN | | 375K| 48M| 3848K|
| 9 | NESTED LOOPS | | 37500 | 3405K| |
|* 10 | TABLE ACCESS FULL| NATION | 1 | 40 | |
|* 11 | TABLE ACCESS FULL| SUPPLIER | 30000 | 1552K| |
|* 12 | TABLE ACCESS FULL | LINEITEM | 7500K| 314M| |
|* 13 | TABLE ACCESS FULL | ORDERS | 37M| 572M| |
| 14 | TABLE ACCESS FULL | LINEITEM | 150M| 3719M| |
|* 15 | TABLE ACCESS FULL | LINEITEM | 7500K| 314M| |
-----------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1
2 - SEL$A317D234 / from$_subquery$_001@SEL$1
3 - SEL$A317D234
10 - SEL$A317D234 / NATION@SEL$2
11 - SEL$A317D234 / SUPPLIER@SEL$2
12 - SEL$A317D234 / L1@SEL$2
13 - SEL$A317D234 / ORDERS@SEL$2
14 - SEL$A317D234 / L2@SEL$3
15 - SEL$A317D234 / L3@SEL$4

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=100)
3 - filter(ROWNUM<=100)
5 - access("L1"."L_ORDERKEY"="L3"."L_ORDERKEY")
filter("L3"."L_SUPPKEY"<>"L1"."L_SUPPKEY")
6 - access("L1"."L_ORDERKEY"="L2"."L_ORDERKEY")
filter("L2"."L_SUPPKEY"<>"L1"."L_SUPPKEY")
7 - access("O_ORDERKEY"="L1"."L_ORDERKEY")
8 - access("S_SUPPKEY"="L1"."L_SUPPKEY")
10 - filter("N_NAME"='SAUDI ARABIA')
11 - filter("S_NATIONKEY"="N_NATIONKEY")
12 - filter("L1"."L_RECEIPTDATE">"L1"."L_COMMITDATE")
13 - filter("O_ORDERSTATUS"='F')
15 - filter("L3"."L_RECEIPTDATE">"L3"."L_COMMITDATE")


And this is what you get from 11.2:


Plan hash value: 823100515

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 4000 | |
|* 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 7500 | 292K| |
|* 3 | SORT ORDER BY STOPKEY | | 7500 | 197K| |
| 4 | HASH GROUP BY | | 7500 | 197K| |
| 5 | VIEW | VM_NWVW_2 | 7500 | 197K| |
|* 6 | FILTER | | | | |
| 7 | HASH GROUP BY | | 7500 | 1794K| 189M|
|* 8 | HASH JOIN | | 749K| 175M| 76M|
|* 9 | HASH JOIN | | 375K| 71M| 66M|
|* 10 | HASH JOIN | | 375K| 61M| 4728K|
|* 11 | HASH JOIN | | 37500 | 4284K| |
|* 12 | TABLE ACCESS FULL| NATION | 1 | 52 | |
| 13 | TABLE ACCESS FULL| SUPPLIER | 750K| 46M| |
|* 14 | TABLE ACCESS FULL | LINEITEM | 7500K| 400M| |
|* 15 | TABLE ACCESS FULL | ORDERS | 37M| 1001M| |
| 16 | TABLE ACCESS FULL | LINEITEM | 150M| 6294M| |
--------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1
2 - SEL$DD8F533F / from$_subquery$_001@SEL$1
3 - SEL$DD8F533F
5 - SEL$11CEEA77 / VM_NWVW_2@SEL$DD8F533F
6 - SEL$11CEEA77
12 - SEL$11CEEA77 / NATION@SEL$2
13 - SEL$11CEEA77 / SUPPLIER@SEL$2
14 - SEL$11CEEA77 / L1@SEL$2
15 - SEL$11CEEA77 / ORDERS@SEL$2
16 - SEL$11CEEA77 / L3@SEL$4

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=100)
3 - filter(ROWNUM<=100)
6 - filter(SUM(CASE WHEN "L3"."L_RECEIPTDATE">"L3"."L_COMMITDATE" THEN 1 ELSE 0 END
)=0)
8 - access("L3"."L_ORDERKEY"="L1"."L_ORDERKEY")
filter("L3"."L_SUPPKEY"<>"L1"."L_SUPPKEY")
9 - access("O_ORDERKEY"="L1"."L_ORDERKEY")
10 - access("S_SUPPKEY"="L1"."L_SUPPKEY")
11 - access("S_NATIONKEY"="N_NATIONKEY")
12 - filter("N_NAME"='SAUDI ARABIA')
14 - filter("L1"."L_RECEIPTDATE">"L1"."L_COMMITDATE")
15 - filter("O_ORDERSTATUS"='F')


Let's ignore the difference in cardinality estimates for the moment, in particular when swapping the left and right side of the correlation predicate in the subqueries, since this is obviously caused by the incomplete fake statistics of my test setup. Whereas the first execution plan looks like a rather expected one, including a SEMI and ANTI join for the two unnested, correlated subqueries and accessing the LINEITEM table three times in total, the 11.2 execution plan looks dramatically different. This is particularly noticeable as my minimal setup doesn't include any fancy primary key/unique or foreign key constraints declarations that could support some of the more recent transformations that Oracle offers. The only constraints that are included are NOT NULL column constraints.

There is no SEMI or ANTI join, and the third instance of LINEITEM is gone, too. When I saw this execution plan for the first time, my initial reaction was: "What a cunning optimization!" - shortly followed by "But it seems to be illegal! (too good to be true)". It turns out that the former is true whereas the latter isn't - the results are correct, although the transformation introduces a potential overhead that can be significant.

If you look closely at the two correlated subqueries it becomes obvious that they share the correlation criteria, but one is an EXISTS clause, and the other one a NOT EXISTS that adds an additional filter predicate.

So one possible idea for a rewrite of the query was to transform the EXISTS clause into a join that allowed filtering the "RECEIPTDATE greater than COMMITDATE" condition checked in the NOT EXISTS clause - thereby getting rid of the third instance of LINEITEM and saving a tremendous amount of work.

But then, these are correlated subqueries and when transforming them into a join care has to be taken that the transformation is semantically equivalent and the result still correct - here transforming the [NOT] EXISTS check into a regular join could potentially lead to duplicates that need to be eliminated introducing overhead again.

And here is what happens internally, the final transformed query from the optimizer trace file looks like this:


SELECT "from$_subquery$_001"."S_NAME" "S_NAME",
"from$_subquery$_001"."NUMWAIT" "NUMWAIT"
FROM
(SELECT "VM_NWVW_2"."$vm_col_1" "S_NAME",
COUNT(*) "NUMWAIT"
FROM
(SELECT
/*+ UNNEST */
"SUPPLIER"."S_NAME" "$vm_col_1"
FROM "CBO_TEST"."LINEITEM" "L3",
"CBO_TEST"."SUPPLIER" "SUPPLIER",
"CBO_TEST"."LINEITEM" "L1",
"CBO_TEST"."ORDERS" "ORDERS",
"CBO_TEST"."NATION" "NATION"
WHERE "SUPPLIER"."S_SUPPKEY"="L1"."L_SUPPKEY"
AND "ORDERS"."O_ORDERKEY" ="L1"."L_ORDERKEY"
AND "ORDERS"."O_ORDERSTATUS"='F'
AND "L1"."L_RECEIPTDATE" >"L1"."L_COMMITDATE"
AND 0 <1
AND "SUPPLIER"."S_NATIONKEY"="NATION"."N_NATIONKEY"
AND "NATION"."N_NAME" ='SAUDI ARABIA'
AND "L3"."L_ORDERKEY" ="L1"."L_ORDERKEY"
AND "L3"."L_SUPPKEY" <>"L1"."L_SUPPKEY"
GROUP BY "NATION".ROWID,
"ORDERS".ROWID,
"L1".ROWID,
"SUPPLIER".ROWID,
"SUPPLIER"."S_NAME"
HAVING SUM(
CASE
WHEN "L3"."L_RECEIPTDATE">"L3"."L_COMMITDATE"
THEN 1
ELSE 0
END )=0
) "VM_NWVW_2"
GROUP BY "VM_NWVW_2"."$vm_col_1"
ORDER BY COUNT(*) DESC,
"VM_NWVW_2"."$vm_col_1"
) "from$_subquery$_001"
WHERE ROWNUM<=100;


So Oracle eliminates the L2 instance of LINEITEM (and the corresponding subquery) by coalescing the EXISTS and the NOT EXISTS subquery which can be seen from the OUTLINE where two COALESCE_SQ hints show up. Finally it transforms the remaining subquery into a regular join that requires an additional aggregation step in order to eliminate potential duplicates (the inner GROUP BY ...ROWID). The HAVING clause applies the additional filter from the NOT EXISTS subquery.

If you run the query with Row Source Statistics enabled using the minimum set of data provided you'll notice that the second join to LINEITEM in fact generates duplicates. So this execution plan is a cunning optimization that allows getting rid of the third instance of LINEITEM, but depending on the number of duplicates generated a significant amount of excess work might be introduced instead.

This might explain why this transformation seems at present only to be applied when dealing with queries that include aggregations anyway. If you change the COUNT(*) ... GROUP BY into a regular query without aggregation, the transformation will not be applied and the traditional SEMI / ANTI join execution plan will show up.

Of course you could also argue that possibly the optimization was particularly aimed at benchmarks like this, but then there are certainly similar real-life queries out there that can benefit from such potential workload reductions via query transformations.

The setup script:


drop table lineitem purge;
drop table orders purge;
drop table supplier purge;
drop table nation purge;

CREATE TABLE lineitem (
l_shipdate DATE NULL,
l_orderkey NUMBER NOT NULL,
l_discount NUMBER NOT NULL,
l_extendedprice NUMBER NOT NULL,
l_suppkey NUMBER NOT NULL,
l_quantity NUMBER NOT NULL,
l_returnflag CHAR(1) NULL,
l_partkey NUMBER NOT NULL,
l_linestatus CHAR(1) NULL,
l_tax NUMBER NOT NULL,
l_commitdate DATE NULL,
l_receiptdate DATE NULL,
l_shipmode CHAR(10) NULL,
l_linenumber NUMBER NOT NULL,
l_shipinstruct CHAR(25) NULL,
l_comment VARCHAR2(44) NULL
)
;

CREATE TABLE orders (
o_orderdate DATE NULL,
o_orderkey NUMBER NOT NULL,
o_custkey NUMBER NOT NULL,
o_orderpriority CHAR(15) NULL,
o_shippriority NUMBER NULL,
o_clerk CHAR(15) NULL,
o_orderstatus CHAR(1) NULL,
o_totalprice NUMBER NULL,
o_comment VARCHAR2(79) NULL
)
;

CREATE TABLE supplier (
s_suppkey NUMBER NOT NULL,
s_nationkey NUMBER NULL,
s_comment VARCHAR2(101) NULL,
s_name CHAR(25) NULL,
s_address VARCHAR2(40) NULL,
s_phone CHAR(15) NULL,
s_acctbal NUMBER NULL
)
;

CREATE TABLE nation (
n_nationkey NUMBER NOT NULL,
n_name CHAR(25) NULL,
n_regionkey NUMBER NULL,
n_comment VARCHAR2(152) NULL
)
;

/*
CREATE INDEX i_l_orderkey
ON lineitem (
l_orderkey
)
;

CREATE UNIQUE INDEX i_o_orderkey
ON orders (
o_orderkey
)
;
*/

exec sys.dbms_stats.set_table_stats(null, 'lineitem', numblks=> 6450000, numrows=> 150000000)

exec sys.dbms_stats.set_table_stats(null, 'orders', numblks=> 3750000, numrows=> 75000000)

exec sys.dbms_stats.set_table_stats(null, 'nation', numblks=> 375, numrows=> 25)

exec sys.dbms_stats.set_table_stats(null, 'supplier', numblks=> 37500, numrows=> 750000)

--exec sys.dbms_stats.set_index_stats(null, 'i_l_orderkey', numlblks=> 645000, numrows=> 150000000, numdist=> 75000000, indlevel => 4, clstfct => 150000000)

--exec sys.dbms_stats.set_index_stats(null, 'i_o_orderkey', numlblks=> 375000, numrows=> 75000000, numdist=> 75000000, indlevel => 3, clstfct => 75000000)

exec sys.dbms_stats.set_column_stats(null, 'lineitem', 'l_orderkey', distcnt => 75000000)

exec sys.dbms_stats.set_column_stats(null, 'lineitem', 'l_suppkey', distcnt => 750000)

exec sys.dbms_stats.set_column_stats(null, 'orders', 'o_orderkey', distcnt => 75000000)

exec sys.dbms_stats.set_column_stats(null, 'orders', 'o_orderstatus', distcnt => 2)

exec sys.dbms_stats.set_column_stats(null, 'supplier', 's_suppkey', distcnt => 750000)

exec sys.dbms_stats.set_column_stats(null, 'supplier', 's_nationkey', distcnt => 25)

exec sys.dbms_stats.set_column_stats(null, 'supplier', 's_name', distcnt => 750000)

exec sys.dbms_stats.set_column_stats(null, 'nation', 'n_nationkey', distcnt => 25)

exec sys.dbms_stats.set_column_stats(null, 'nation', 'n_name', distcnt => 20)

insert into nation (n_nationkey, n_name) values (1, 'SAUDI ARABIA');

insert into supplier (s_suppkey, s_name, s_nationkey) values (1, 'SUPPLIER1', 1);

insert into supplier (s_suppkey, s_name, s_nationkey) values (2, 'SUPPLIER2', 1);

insert into orders (o_orderkey, o_custkey, o_orderstatus) values (1, 1, 'F');

insert into orders (o_orderkey, o_custkey, o_orderstatus) values (2, 1, 'A');

insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 1, 0, 1, 1, 0, sysdate + 1, sysdate);

insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 1, 0, 2, 1, 0, sysdate + 1, sysdate);

insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 1, 0, 3, 1, 0, sysdate, sysdate);

insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 2, 0, 1, 1, 0, sysdate, sysdate);

insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 2, 0, 2, 1, 0, sysdate, sysdate);

Thursday, March 15, 2012

AllThingsOracle.com

Just a short note that as already announced last week the second part of the post on how to control Dynamic Sampling activity has been published on AllThingsOracle.com.

You can read there about the (not so much known) DYNAMIC_SAMPLING_EST_CDN hint, the new Dynamic Sampling auto adjustment feature introduced in release 11.2 and other interesting details.

Thursday, March 8, 2012

AllThingsOracle.com

This is just a short note that the next part of my Dynamic Sampling series has been posted on AllThingsOracle.com

This time I highlight how to configure Dynamic Sampling - the first part focusing on the available session/statement levels. The next part will be posted soon and explains some other important concepts regarding Dynamic Sampling behaviour.

Unfortunately these topics are not covered very well by the official documentation. In fact the best coverage used to be in the documentation of the initial release that included Dynamic Sampling (9iR2). You can find the link to that old documentation version in the article over there.

Thursday, March 1, 2012

Join Views, ROWIDs And Query Transformations

Here is an odd little bug that was discussed a couple of weeks ago on the OTN forums.

It's about queries on join views by ROWID that fail with "ORA-01410: invalid ROWID" under certain circumstances. The bug can only be reproduced when using the 11.2 code base. In fact the same setup will cause an internal error in 11.1, but 10.2 will return correct results.

It's probably not a very common scenario but it is an interesting example of how features that work fine by themselves can cause problems when used together.

First of all (hopefully) some of you may ask: How is it possible to query from a join view by ROWID, since the view is based on multiple objects and hence doesn't have a simple one-to-one mapping to a ROWID of a single table?

The answer is: It requires "Key-Preserved Tables". This is a term that is frequently used by Oracle in their documentation, in particular in the context of updatable join views. This means that you cannot query every join view by ROWID - it has to meet certain prerequisites otherwise Oracle will throw an error "ORA-01445: cannot select ROWID from, or sample, a join view without a key-preserved table".

Obviously Oracle needs to be able to determine if there is at least one table in the view for which it is guaranteed that its rows can show up at most once in the result set of the view, which means that there need to be a minimum set of unique or primary key constraints at least if the view involves more than a single table. Furthermore Oracle needs to be able to determine this automatically, and depending on the complexity and expressions used, this is not always the case. See this post by Jonathan Lewis about a discussion of possible improvements in this regard. Other database vendors seem to allow at least to some degree more complex queries to be recognized as what Oracle calls "key-preserved".

Let's start with a simple example that creates a simple join view on three tables, furthermore appropriate constraints are created. The sample generates more constraints than strictly necessary to get a key-preserved table. The advantage of these additional constraints is that Oracle can perform a table or join elimination transformation when querying data only from a subset of the tables joined in the view.


drop table t1;

purge table t1;

drop table t2;

purge table t2;

drop table t3;

purge table t3;

drop view v;

drop view v_t1;

create table t1
as
select
rownum as id
, mod(rownum, 1000) + 1 as fk_id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000
;

exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

create table t2
as
select
rownum as id
, mod(rownum, 100) + 1 as fk_id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000
;

exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

create table t3
as
select
rownum as id
, mod(rownum, 10) + 1 as fk_id
, rpad('x', 100) as filler
from
dual
connect by
level <= 100
;

exec dbms_stats.gather_table_stats(null, 't3', method_opt => 'for all columns size 1')

alter table t1 add constraint t1_pk primary key (id);

alter table t2 add constraint t2_pk primary key (id);

alter table t3 add constraint t3_pk primary key (id);

alter table t1 add constraint t1_t2_fk foreign key (fk_id) references t2 (id);

alter table t2 add constraint t2_t3_fk foreign key (fk_id) references t3 (id);

alter table t1 modify fk_id not null;

alter table t2 modify fk_id not null;

alter table t3 modify fk_id not null;

create or replace view
v
as
select
t1.id as t1_id
, t1.fk_id as t1_fk_id
, t1.filler as t1_filler
, t2.id as t2_id
, t2.fk_id as t2_fk_id
, t2.filler as t2_filler
, t3.id as t3_id
, t3.fk_id as t3_fk_id
, t3.filler as t3_filler
from
t1
, t2
, t3
where
t1.fk_id = t2.id
and t2.fk_id = t3.id
;


So this is a simple join, and given the current setup the key-preserved table is T1.

We can verify this by this simple query:


SQL> select object_name from user_objects where object_id = (select sys.dbms_rowid.rowid_object(rowid) from v where rownum <= 1);

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
T1


So Oracle picks automatically T1 as source for the ROWID.

Let's see the join table elimination feature in action by using the following sample:


SQL> column rid new_value rid
SQL>
SQL> select rowid as rid from v where rownum <= 1;

RID
------------------
AAAVTGAAIAAAAEBAAA

SQL>
SQL> select t1_id from v where rowid = '&rid';
old 1: select t1_id from v where rowid = '&rid'
new 1: select t1_id from v where rowid = 'AAAVTGAAIAAAAEBAAA'

T1_ID
----------
1

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID 5stwqhguqgjr0, child number 0
-------------------------------------
select t1_id from v where rowid = 'AAAVTGAAIAAAAEBAAA'

Plan hash value: 487051824

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | TABLE ACCESS BY USER ROWID| T1 | 1 | 20 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------


13 rows selected.

SQL>
SQL> column rid clear


Oracle 11.2 reduces this query to the simplest possible form - the tables T2 and T3 do not even show up in the execution plan. The validated and non-deferrable foreign key constraints ensure that Oracle can safely ignore the join to these tables given that only data from T1 is accessed.

So the Query Transformation seems to work pretty well and everything is fine so far.

Now let's modify the setup a little bit. As you might have noticed I've crafted the T2 and T1 tables deliberately in such a way that I can define a 1:1 relationship between them: The T1.FK_ID column is also unique.

So let's declare a unique constraint on T1.FK_ID to tell the database about this 1:1 relationship. Furthermore I'll define the simplest possible view on top of T1 that does nothing else than simply a SELECT * FROM T1:


alter table t1 add constraint t1_uq unique (fk_id);

create or replace view v_t1 as select * from t1;

create or replace view
v
as
select
t1.id as t1_id
, t1.fk_id as t1_fk_id
, t1.filler as t1_filler
, t2.id as t2_id
, t2.fk_id as t2_fk_id
, t2.filler as t2_filler
, t3.id as t3_id
, t3.fk_id as t3_fk_id
, t3.filler as t3_filler
from
v_t1 t1
, t2
, t3
where
t1.fk_id = t2.id
and t2.fk_id = t3.id
;


Finally I've changed the view V to refer to the view V_T1 instead of the base table.

Let's repeat the check about the source of the ROWID now:


SQL> select object_name from user_objects where object_id = (select sys.dbms_rowid.rowid_object(rowid) from v where rownum <= 1);

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
T2


Oracle now has selected T2 as the key-preserved table which is fine since T2 is now also guaranteed to be unique in the result set of the view.

Finally, let's repeat the query that I used to demonstrate the join elimination query transformation:


SQL> column rid clear
SQL> set echo on verify on
SQL>
SQL> column rid new_value rid
SQL>
SQL> select rowid as rid from v where rownum <= 1;

RID
------------------
AAAVTHAAIAAAAGBAAA

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID 8ns6u01cr94xa, child number 0
-------------------------------------
select rowid as rid from v where rownum <= 1

Plan hash value: 1420877628

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | NESTED LOOPS | | 1 | 20 | 2 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| T1_UQ | 1000 | 4000 | 2 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | T2_PK | 1 | 16 | 0 (0)| |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=1)
4 - access("FK_ID"="T2"."ID")


22 rows selected.

SQL>
SQL> select t1_id from v where rowid = '&rid';
old 1: select t1_id from v where rowid = '&rid'
new 1: select t1_id from v where rowid = 'AAAVTHAAIAAAAGBAAA'
select t1_id from v where rowid = 'AAAVTHAAIAAAAGBAAA'
*
ERROR at line 1:
ORA-01410: invalid ROWID


SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID cgvynnw6sthrw, child number 0
-------------------------------------
select t1_id from v where rowid = 'AAAVTHAAIAAAAGBAAA'

Plan hash value: 487051824

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | TABLE ACCESS BY USER ROWID| T1 | 1 | 20 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------


13 rows selected.

SQL>
SQL> column rid clear


Oops, that doesn't look too good: Oracle applied the ROWID to the wrong table respectively eliminated T2 from the execution plan although it uses T2 to obtain the ROWID. As you can see from the execution plan of the initial query that fetches the first ROWID from the view, T2 is not eliminated in that case.

So the moral of the story: Simple View Merging, another Query Transformation together with Join Elimination causes Oracle 11.2 to apply the ROWID to the wrong table in case multiple possible candidates for key-preserved tables exist. You can see this from the optimizer trace file where suddenly a T1.ROWID = '...' predicate pops up.

When replacing the view with the base table the problem cannot be reproduced. Preventing one of the transformations (Join Elimination or View Merging) also prevents the issue.

Interestingly changing the optimizer features to something below 11g also allowed avoiding the bug in the OTN thread, but with the given test case here I can still reproduce the problem on 11.2.0.3 (but not 11.2.0.1 and 11.2.0.2) when setting the OPTIMIZER_FEATURES_ENABLE to 10.2.0.4 for example:


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter session set optimizer_features_enable = '10.2.0.4';

Session altered.

SQL> select object_name from user_objects where object_id = (select sys.dbms_rowid.rowid_object(rowid) from v where rownum <= 1);

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
T2

SQL>
SQL> set echo on verify on
SQL>
SQL> column rid new_value rid
SQL>
SQL> select rowid as rid from v where rownum <= 1;

RID
------------------
AAAVAUAAEAAAEnTAAA

SQL>
SQL> select t1_id from v where rowid = '&rid';
old 1: select t1_id from v where rowid = '&rid'
new 1: select t1_id from v where rowid = 'AAAVAUAAEAAAEnTAAA'
select t1_id from v where rowid = 'AAAVAUAAEAAAEnTAAA'
*
ERROR at line 1:
ORA-01410: invalid ROWID


SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 72yy78z1ggn75, child number 1
-------------------------------------
select t1_id from v where rowid = 'AAAVAUAAEAAAEnTAAA'

Plan hash value: 396691268

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | NESTED LOOPS | | 1 | 27 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY USER ROWID| T1 | 1 | 20 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | T2_PK | 1 | | 0 (0)| |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("FK_ID"="T2"."ID")


20 rows selected.

SQL>
SQL> column rid clear


Although it can be seen that the lower optimizer features setting resulted in a different plan where T2 wasn't eliminated, the ROWID predicate was still applied to the wrong table, which is clearly a change in behaviour compared to previous releases, of course in particular 10.2.0.4. So this is again one of the cases where setting OPTIMIZER_FEATURES_ENABLE doesn't reproduce exactly the same plan.

So my preliminary analysis in the OTN thread wasn't entirely correct - the new Outer Join elimination Query Transformation introduced in 11g wasn't causing the problems in the original case - these additional transformations were just triggering the side effect: It needs both a join elimination and a view merging transformation. That is the reason why the test case includes a third table. When omitting the third table, or for example querying columns from the third table that prevent the elimination of T3 the error doesn't reproduce either.

Footnote: There are more variations of similar bugs. See for example "Bug 10129357: Assorted errors referencing ROWID from a view with subquery" where other kinds of query transformations result in invalid SQL throwing ROWID related errors.