Oracle 11.2 introduced a set of new
Query Transformations, among others the ability to coalesce subqueries which means that multiple correlated subqueries can be merged into a number of less subqueries.
Timur Akhmadeev already demonstrated the
basic principles in a blog entry, but when I was recently involved into supporting a TPC-H benchmark for a particular storage vendor I saw a quite impressive application of this optimization that I would like to share here.
In principle the TPC-H benchmark is simple and attempts to simulate typical DWH query workloads (or what was assumed to be a typical DWH workload when it was designed many years ago) with only a rather limited amount of DML in the mix. The query part consists of 22 queries that have to be run one after the other in order to measure the so called "Power" component of the benchmark. Similar 22 queries will then be run concurrently by at least 5 "streams", where each stream runs them in a different (pseudo-randomized) order to measure the so called "Throughput" part of the benchmark. There is also a DML part but it is almost negligible compared to the query load generated.
One of the most demanding queries out of the 22 is called "Suppliers Who Kept Orders Waiting Query (Q21)" and looks like this in Oracle SQL:
select
*
from (
select
s_name
, count(*) as numwait
from
supplier
, lineitem l1
, orders
, nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
null
from
lineitem l2
where
l1.l_orderkey = l2.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
null
from
lineitem l3
where
l1.l_orderkey = l3.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
group by
s_name
order by
numwait desc
, s_name
)
where
rownum <= 100
;
The demanding part of the query is that it accesses the by far largest table LINEITEM
three times: Once in the main query and twice as part of the
correlated subqueries (EXISTS (...L2...) / NOT EXISTS (...L3...)).
A minimal setup to reproduce the execution plans can be found at the end of this post.
I've deliberately kept the complexity of the setup at the bare minimum - usually the actual table definitions include parallelism, partitioning, compression and other options like freelist and freelist groups for MSSM setups.
Let's have a look at the execution plan produced by pre-11.2 optimizer versions:
Plan hash value: 1997471497
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc|
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 4000 | |
|* 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 375K| 14M| |
|* 3 | SORT ORDER BY STOPKEY | | 375K| 79M| 86M|
| 4 | HASH GROUP BY | | 375K| 79M| 86M|
|* 5 | HASH JOIN ANTI | | 375K| 79M| 68M|
|* 6 | HASH JOIN SEMI | | 375K| 64M| 59M|
|* 7 | HASH JOIN | | 375K| 54M| 53M|
|* 8 | HASH JOIN | | 375K| 48M| 3848K|
| 9 | NESTED LOOPS | | 37500 | 3405K| |
|* 10 | TABLE ACCESS FULL| NATION | 1 | 40 | |
|* 11 | TABLE ACCESS FULL| SUPPLIER | 30000 | 1552K| |
|* 12 | TABLE ACCESS FULL | LINEITEM | 7500K| 314M| |
|* 13 | TABLE ACCESS FULL | ORDERS | 37M| 572M| |
| 14 | TABLE ACCESS FULL | LINEITEM | 150M| 3719M| |
|* 15 | TABLE ACCESS FULL | LINEITEM | 7500K| 314M| |
-----------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$A317D234 / from$_subquery$_001@SEL$1
3 - SEL$A317D234
10 - SEL$A317D234 / NATION@SEL$2
11 - SEL$A317D234 / SUPPLIER@SEL$2
12 - SEL$A317D234 / L1@SEL$2
13 - SEL$A317D234 / ORDERS@SEL$2
14 - SEL$A317D234 / L2@SEL$3
15 - SEL$A317D234 / L3@SEL$4
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=100)
3 - filter(ROWNUM<=100)
5 - access("L1"."L_ORDERKEY"="L3"."L_ORDERKEY")
filter("L3"."L_SUPPKEY"<>"L1"."L_SUPPKEY")
6 - access("L1"."L_ORDERKEY"="L2"."L_ORDERKEY")
filter("L2"."L_SUPPKEY"<>"L1"."L_SUPPKEY")
7 - access("O_ORDERKEY"="L1"."L_ORDERKEY")
8 - access("S_SUPPKEY"="L1"."L_SUPPKEY")
10 - filter("N_NAME"='SAUDI ARABIA')
11 - filter("S_NATIONKEY"="N_NATIONKEY")
12 - filter("L1"."L_RECEIPTDATE">"L1"."L_COMMITDATE")
13 - filter("O_ORDERSTATUS"='F')
15 - filter("L3"."L_RECEIPTDATE">"L3"."L_COMMITDATE")
And this is what you get from 11.2:
Plan hash value: 823100515
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 4000 | |
|* 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 7500 | 292K| |
|* 3 | SORT ORDER BY STOPKEY | | 7500 | 197K| |
| 4 | HASH GROUP BY | | 7500 | 197K| |
| 5 | VIEW | VM_NWVW_2 | 7500 | 197K| |
|* 6 | FILTER | | | | |
| 7 | HASH GROUP BY | | 7500 | 1794K| 189M|
|* 8 | HASH JOIN | | 749K| 175M| 76M|
|* 9 | HASH JOIN | | 375K| 71M| 66M|
|* 10 | HASH JOIN | | 375K| 61M| 4728K|
|* 11 | HASH JOIN | | 37500 | 4284K| |
|* 12 | TABLE ACCESS FULL| NATION | 1 | 52 | |
| 13 | TABLE ACCESS FULL| SUPPLIER | 750K| 46M| |
|* 14 | TABLE ACCESS FULL | LINEITEM | 7500K| 400M| |
|* 15 | TABLE ACCESS FULL | ORDERS | 37M| 1001M| |
| 16 | TABLE ACCESS FULL | LINEITEM | 150M| 6294M| |
--------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$DD8F533F / from$_subquery$_001@SEL$1
3 - SEL$DD8F533F
5 - SEL$11CEEA77 / VM_NWVW_2@SEL$DD8F533F
6 - SEL$11CEEA77
12 - SEL$11CEEA77 / NATION@SEL$2
13 - SEL$11CEEA77 / SUPPLIER@SEL$2
14 - SEL$11CEEA77 / L1@SEL$2
15 - SEL$11CEEA77 / ORDERS@SEL$2
16 - SEL$11CEEA77 / L3@SEL$4
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=100)
3 - filter(ROWNUM<=100)
6 - filter(SUM(CASE WHEN "L3"."L_RECEIPTDATE">"L3"."L_COMMITDATE" THEN 1 ELSE 0 END
)=0)
8 - access("L3"."L_ORDERKEY"="L1"."L_ORDERKEY")
filter("L3"."L_SUPPKEY"<>"L1"."L_SUPPKEY")
9 - access("O_ORDERKEY"="L1"."L_ORDERKEY")
10 - access("S_SUPPKEY"="L1"."L_SUPPKEY")
11 - access("S_NATIONKEY"="N_NATIONKEY")
12 - filter("N_NAME"='SAUDI ARABIA')
14 - filter("L1"."L_RECEIPTDATE">"L1"."L_COMMITDATE")
15 - filter("O_ORDERSTATUS"='F')
Let's ignore the difference in cardinality estimates for the moment, in particular when swapping the left and right side of the correlation predicate in the subqueries, since this is obviously caused by the incomplete fake statistics of my test setup. Whereas the first execution plan looks like a rather expected one, including a
SEMI and ANTI join for the two
unnested, correlated subqueries and accessing the LINEITEM table
three times in total, the 11.2 execution plan looks dramatically different. This is particularly noticeable as my minimal setup doesn't include any fancy
primary key/unique or foreign key constraints declarations that could support some of the more recent transformations that Oracle offers. The only constraints that are included are NOT NULL column constraints.
There is
no SEMI or ANTI join, and the third instance of LINEITEM is gone, too. When I saw this execution plan for the first time, my initial reaction was: "What a cunning optimization!" - shortly followed by "But it seems to be illegal! (too good to be true)". It turns out that the former is true whereas the latter isn't - the results are correct, although the transformation introduces a potential
overhead that can be significant.
If you look closely at the two correlated subqueries it becomes obvious that they share the correlation criteria, but one is an EXISTS clause, and the other one a NOT EXISTS that adds an additional filter predicate.
So one possible idea for a rewrite of the query was to transform the EXISTS clause into a
join that allowed filtering the "RECEIPTDATE greater than COMMITDATE" condition checked in the NOT EXISTS clause - thereby getting rid of the third instance of LINEITEM and saving a tremendous amount of work.
But then, these are correlated subqueries and when transforming them into a join care has to be taken that the transformation is
semantically equivalent and the result still correct - here transforming the [NOT] EXISTS check into a regular join could potentially lead to duplicates that need to be eliminated introducing overhead again.
And here is what happens internally, the final transformed query from the optimizer trace file looks like this:
SELECT "from$_subquery$_001"."S_NAME" "S_NAME",
"from$_subquery$_001"."NUMWAIT" "NUMWAIT"
FROM
(SELECT "VM_NWVW_2"."$vm_col_1" "S_NAME",
COUNT(*) "NUMWAIT"
FROM
(SELECT
/*+ UNNEST */
"SUPPLIER"."S_NAME" "$vm_col_1"
FROM "CBO_TEST"."LINEITEM" "L3",
"CBO_TEST"."SUPPLIER" "SUPPLIER",
"CBO_TEST"."LINEITEM" "L1",
"CBO_TEST"."ORDERS" "ORDERS",
"CBO_TEST"."NATION" "NATION"
WHERE "SUPPLIER"."S_SUPPKEY"="L1"."L_SUPPKEY"
AND "ORDERS"."O_ORDERKEY" ="L1"."L_ORDERKEY"
AND "ORDERS"."O_ORDERSTATUS"='F'
AND "L1"."L_RECEIPTDATE" >"L1"."L_COMMITDATE"
AND 0 <1
AND "SUPPLIER"."S_NATIONKEY"="NATION"."N_NATIONKEY"
AND "NATION"."N_NAME" ='SAUDI ARABIA'
AND "L3"."L_ORDERKEY" ="L1"."L_ORDERKEY"
AND "L3"."L_SUPPKEY" <>"L1"."L_SUPPKEY"
GROUP BY "NATION".ROWID,
"ORDERS".ROWID,
"L1".ROWID,
"SUPPLIER".ROWID,
"SUPPLIER"."S_NAME"
HAVING SUM(
CASE
WHEN "L3"."L_RECEIPTDATE">"L3"."L_COMMITDATE"
THEN 1
ELSE 0
END )=0
) "VM_NWVW_2"
GROUP BY "VM_NWVW_2"."$vm_col_1"
ORDER BY COUNT(*) DESC,
"VM_NWVW_2"."$vm_col_1"
) "from$_subquery$_001"
WHERE ROWNUM<=100;
So Oracle
eliminates the L2 instance of LINEITEM (and the corresponding subquery) by coalescing the EXISTS and the NOT EXISTS subquery which can be seen from the
OUTLINE where
two COALESCE_SQ hints show up. Finally it transforms the remaining subquery into a regular join that requires an
additional aggregation step in order to eliminate potential duplicates (the inner GROUP BY ...ROWID). The HAVING clause applies the additional filter from the NOT EXISTS subquery.
If you run the query
with Row Source Statistics enabled using the minimum set of data provided you'll notice that the second join to LINEITEM in fact generates duplicates. So this execution plan is a cunning optimization that allows getting rid of the third instance of LINEITEM, but depending on the number of duplicates generated a significant amount of excess work might be introduced instead.
This might explain why this transformation seems at present only to be applied when dealing with queries that include aggregations anyway. If you change the COUNT(*) ... GROUP BY into a regular query without aggregation, the transformation will not be applied and the traditional SEMI / ANTI join execution plan will show up.
Of course you could also argue that possibly the optimization was particularly aimed at benchmarks like this, but then there are certainly similar real-life queries out there that can benefit from such potential workload reductions via query transformations.
The setup script:
drop table lineitem purge;
drop table orders purge;
drop table supplier purge;
drop table nation purge;
CREATE TABLE lineitem (
l_shipdate DATE NULL,
l_orderkey NUMBER NOT NULL,
l_discount NUMBER NOT NULL,
l_extendedprice NUMBER NOT NULL,
l_suppkey NUMBER NOT NULL,
l_quantity NUMBER NOT NULL,
l_returnflag CHAR(1) NULL,
l_partkey NUMBER NOT NULL,
l_linestatus CHAR(1) NULL,
l_tax NUMBER NOT NULL,
l_commitdate DATE NULL,
l_receiptdate DATE NULL,
l_shipmode CHAR(10) NULL,
l_linenumber NUMBER NOT NULL,
l_shipinstruct CHAR(25) NULL,
l_comment VARCHAR2(44) NULL
)
;
CREATE TABLE orders (
o_orderdate DATE NULL,
o_orderkey NUMBER NOT NULL,
o_custkey NUMBER NOT NULL,
o_orderpriority CHAR(15) NULL,
o_shippriority NUMBER NULL,
o_clerk CHAR(15) NULL,
o_orderstatus CHAR(1) NULL,
o_totalprice NUMBER NULL,
o_comment VARCHAR2(79) NULL
)
;
CREATE TABLE supplier (
s_suppkey NUMBER NOT NULL,
s_nationkey NUMBER NULL,
s_comment VARCHAR2(101) NULL,
s_name CHAR(25) NULL,
s_address VARCHAR2(40) NULL,
s_phone CHAR(15) NULL,
s_acctbal NUMBER NULL
)
;
CREATE TABLE nation (
n_nationkey NUMBER NOT NULL,
n_name CHAR(25) NULL,
n_regionkey NUMBER NULL,
n_comment VARCHAR2(152) NULL
)
;
/*
CREATE INDEX i_l_orderkey
ON lineitem (
l_orderkey
)
;
CREATE UNIQUE INDEX i_o_orderkey
ON orders (
o_orderkey
)
;
*/
exec sys.dbms_stats.set_table_stats(null, 'lineitem', numblks=> 6450000, numrows=> 150000000)
exec sys.dbms_stats.set_table_stats(null, 'orders', numblks=> 3750000, numrows=> 75000000)
exec sys.dbms_stats.set_table_stats(null, 'nation', numblks=> 375, numrows=> 25)
exec sys.dbms_stats.set_table_stats(null, 'supplier', numblks=> 37500, numrows=> 750000)
--exec sys.dbms_stats.set_index_stats(null, 'i_l_orderkey', numlblks=> 645000, numrows=> 150000000, numdist=> 75000000, indlevel => 4, clstfct => 150000000)
--exec sys.dbms_stats.set_index_stats(null, 'i_o_orderkey', numlblks=> 375000, numrows=> 75000000, numdist=> 75000000, indlevel => 3, clstfct => 75000000)
exec sys.dbms_stats.set_column_stats(null, 'lineitem', 'l_orderkey', distcnt => 75000000)
exec sys.dbms_stats.set_column_stats(null, 'lineitem', 'l_suppkey', distcnt => 750000)
exec sys.dbms_stats.set_column_stats(null, 'orders', 'o_orderkey', distcnt => 75000000)
exec sys.dbms_stats.set_column_stats(null, 'orders', 'o_orderstatus', distcnt => 2)
exec sys.dbms_stats.set_column_stats(null, 'supplier', 's_suppkey', distcnt => 750000)
exec sys.dbms_stats.set_column_stats(null, 'supplier', 's_nationkey', distcnt => 25)
exec sys.dbms_stats.set_column_stats(null, 'supplier', 's_name', distcnt => 750000)
exec sys.dbms_stats.set_column_stats(null, 'nation', 'n_nationkey', distcnt => 25)
exec sys.dbms_stats.set_column_stats(null, 'nation', 'n_name', distcnt => 20)
insert into nation (n_nationkey, n_name) values (1, 'SAUDI ARABIA');
insert into supplier (s_suppkey, s_name, s_nationkey) values (1, 'SUPPLIER1', 1);
insert into supplier (s_suppkey, s_name, s_nationkey) values (2, 'SUPPLIER2', 1);
insert into orders (o_orderkey, o_custkey, o_orderstatus) values (1, 1, 'F');
insert into orders (o_orderkey, o_custkey, o_orderstatus) values (2, 1, 'A');
insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 1, 0, 1, 1, 0, sysdate + 1, sysdate);
insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 1, 0, 2, 1, 0, sysdate + 1, sysdate);
insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 1, 0, 3, 1, 0, sysdate, sysdate);
insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 2, 0, 1, 1, 0, sysdate, sysdate);
insert into lineitem (l_orderkey, l_discount, l_extendedprice, l_suppkey, l_quantity, l_linenumber, l_partkey, l_tax, l_receiptdate, l_commitdate) values (1, 0, 0, 2, 0, 2, 1, 0, sysdate, sysdate);