Usually the Cost-Based Optimizer arrives at a reasonable execution plan if it gets the estimates regarding
cardinality and
data scattering / clustering right (if you want to learn more about that why not watch my
Webinar available at "AllThingsOracle.com"?).
Here is an example I've recently come across where this wasn't case - the optimizer obviously preferred plans with a significantly higher cost.
The setup to reproduce the issue is simple:
create table t1
as
select
rownum as id
, mod(rownum, 100) + 1 as attr1
, 'DESC' || to_char(mod(rownum, 100) + 1, 'TM') as attr2
, mod(rownum, 10) + 1 as attr3
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;
exec dbms_stats.gather_table_stats(null, 't1')
create table t2
as
select
rownum as id
, mod(rownum, 2) + 1 as attr1
, 'DESC' || to_char(mod(rownum, 2) + 1, 'TM') as attr2
, mod(rownum, 10) + 1 as attr3
, rpad('x', 100) as filler
from
dual
connect by
level <= 10000
;
exec dbms_stats.gather_table_stats(null, 't2', no_invalidate => false)
So we have two tables, and the
smaller one of the tables basically was used as a
"security" information to restrict the data from the
larger table according to some
user entitlement (apologies for the generic column and table names that don't make this more obvious).
The query that I'm looking at is the following:
with
a as (
select
*
from
t1
where
attr3 in (select distinct attr3 from t2 where attr1 = 1)
),
b as (
select
attr1 as "attr1"
, min(attr2) as "min_attr2"
from
a
group by
attr1
)
select
"attr1" as "memberuniquename"
, min("min_attr2") as "member"
from
b
where
1 = 1
group by
"attr1"
;
At first sight you might ask yourself, why should anyone write a query like that, in particular the
duplicated aggregation step? The answer is of course, a human being very likely wouldn't write such a query, but this is what you get from some
report generators based on generic meta data.
In principle the query is looking for a simple list of values from a table (in this case dimension tables that are browsed interactively), but restricted according to the "profile" of a particular user, here represented by the
unique list of values of
T2.ATTR3 restricted on T2.ATTR1 = 1.
There are a couple of design issues with the reproduced approach, but the query doesn't look too complex after all.
One specific issue of the query is that the
MIN() aggregate allows Oracle a certain kind of flexibility when to apply the aggregate function: It could be applied on
duplicated data and still generate the correct answer, something that is not necessarily possible with other aggregate functions like COUNT() for example.
One potential threat therefore comes from the fact that a query transformation of the IN clause to a kind of join can produce a
huge intermediate result set since
ATTR3 in both tables only has
ten distinct values. Usually this is not a problem since the optimizer, provided it gets the estimates right, should
recognize this pattern and calculate a corresponding increased cost for such plans, and therefore prefer other execution plans that avoid this in some way, for example by using a
DISTINCT / GROUP BY operation early before a join, or simply using a
semi join operation.
In fact, the query could be simplified to the following:
select
attr1 as "memberuniquename"
, min(attr2) as "member"
from
t1
where
t1.attr3 in (select t2.attr3 from t2 where attr1 = 1)
group by
attr1
;
The database version in question was
11.2.0.1, and if you run the simplified query, then you'll get the following execution plan:
Plan hash value: 1591942764
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 4500 | 4858 (3)| 00:00:59 |
| 1 | HASH GROUP BY | | 100 | 4500 | 4858 (3)| 00:00:59 |
|* 2 | HASH JOIN | | 708 | 31860 | 4857 (3)| 00:00:59 |
| 3 | VIEW | VW_GBF_6 | 10 | 30 | 50 (4)| 00:00:01 |
| 4 | HASH GROUP BY | | 10 | 60 | 50 (4)| 00:00:01 |
|* 5 | TABLE ACCESS FULL| T2 | 5000 | 30000 | 48 (0)| 00:00:01 |
| 6 | VIEW | VW_GBC_5 | 708 | 29736 | 4807 (3)| 00:00:58 |
| 7 | HASH GROUP BY | | 708 | 9204 | 4807 (3)| 00:00:58 |
| 8 | TABLE ACCESS FULL| T1 | 1000K| 12M| 4707 (1)| 00:00:57 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ITEM_1"="ITEM_1")
5 - filter("ATTR1"=1)
The execution plan is reasonable and performs reasonable - the optimizer used the so called
"Group By Placement" transformation which took one of the predicted approaches to apply the DISTINCT (transformed to a GROUP BY)
as early as possible and arrived at the following query after transformation and costing, taken from the optimizer trace file and slightly re-formatted for better readability:
SELECT
"VW_GBC_5"."ITEM_3" "ATTR1"
, MIN("VW_GBC_5"."ITEM_2") "MIN(ATTR2)"
FROM
(
SELECT
"T2"."ATTR3" "ITEM_1"
FROM
"T2" "T2"
WHERE
"T2"."ATTR1"=1
GROUP BY
"T2"."ATTR3"
) "VW_GBF_6"
, (
SELECT
"T1"."ATTR3" "ITEM_1"
, MIN("T1"."ATTR2") "ITEM_2"
, "T1"."ATTR1" "ITEM_3"
FROM
"T1" "T1"
GROUP BY
"T1"."ATTR3"
, "T1"."ATTR1"
) "VW_GBC_5"
WHERE
"VW_GBC_5"."ITEM_1"="VW_GBF_6"."ITEM_1"
GROUP BY
"VW_GBC_5"."ITEM_3"
;
which is funnily a bit
similar to the original query initially mentioned.
If you however check the execution plan of the original query, you'll get the following:
Plan hash value: 3221309153
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 3900 | 81386 (95)| 00:16:17 |
| 1 | HASH GROUP BY | | 100 | 3900 | 81386 (95)| 00:16:17 |
| 2 | VIEW | | 100 | 3900 | 81386 (95)| 00:16:17 |
| 3 | HASH GROUP BY | | 100 | 1900 | 81386 (95)| 00:16:17 |
|* 4 | HASH JOIN | | 500M| 9059M| 10222 (54)| 00:02:03 |
|* 5 | TABLE ACCESS FULL| T2 | 5000 | 30000 | 48 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL| T1 | 1000K| 12M| 4707 (1)| 00:00:57 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("ATTR3"="ATTR3")
5 - filter("ATTR1"=1)
This execution plan looks
quite different, and it shows exactly the problem I've described - the optimizer has the option to
postpone the aggregation after the join due to the specifics of the query: A
MIN() aggregate can be applied late, but the problem can be seen from the cardinality estimate of the join - a whopping
500 million rows, that are crunched afterwards into 100 rows. This doesn't look like an efficient approach, and indeed at actual execution time a lot of CPU time (several
minutes, depending on the CPU speed) gets used for handling that huge amount of intermediate data. It is probably a
bug that it doesn't use at least a simple
semi join to avoid the duplicates.
When I first saw this problem I was pretty sure that this is a typical problem of the impact of the
WITH clause on
query transformations in versions prior to
11.2.0.3 where that problem finally was
addressed (to a very large extend, although not completely eliminated).
And indeed, when rewriting the original query to the corresponding one using
inline views instead of
Subquery Factoring / Common Table Expression:
select
"attr1" as "memberuniquename"
, min("min_attr2") as "member"
from (
select
attr1 as "attr1"
, min(attr2) as "min_attr2"
from
(
select
*
from
t1
where
attr3 in (
select
distinct attr3
from
t2
where
attr1 = 1
)
) a
group by
attr1
) b
where
1 = 1
group by
"attr1"
;
the following execution plan will be generated:
Plan hash value: 1699732834
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 3900 | 4858 (3)| 00:00:59 |
| 1 | HASH GROUP BY | | 100 | 3900 | 4858 (3)| 00:00:59 |
| 2 | VIEW | | 100 | 3900 | 4858 (3)| 00:00:59 |
| 3 | HASH GROUP BY | | 100 | 4500 | 4858 (3)| 00:00:59 |
|* 4 | HASH JOIN | | 708 | 31860 | 4857 (3)| 00:00:59 |
| 5 | VIEW | VW_GBF_8 | 10 | 30 | 50 (4)| 00:00:01 |
| 6 | HASH GROUP BY | | 10 | 60 | 50 (4)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| T2 | 5000 | 30000 | 48 (0)| 00:00:01 |
| 8 | VIEW | VW_GBC_7 | 708 | 29736 | 4807 (3)| 00:00:58 |
| 9 | HASH GROUP BY | | 708 | 9204 | 4807 (3)| 00:00:58 |
| 10 | TABLE ACCESS FULL| T1 | 1000K| 12M| 4707 (1)| 00:00:57 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("ITEM_1"="ITEM_1")
7 - filter("ATTR1"=1)
which is almost
identical to the execution plan of the simplified query. Notice also the
huge difference in cost compared to the execution plan for the original query. A test on
11.2.0.3 showed the same execution plan without the re-write to inline views, so the changes introduced in 11.2.0.3 address this issue.
I've advised therefore to
disable the usage of the WITH clause in the report generator, which was a matter of a simple click in the tool.
More Query Variations
To that point I thought this could be filed as just another example of the already known
Subquery Factoring / Query Transformations issues, but it's not over yet - the report generator managed to produce more variations of the query. I liked in particular this one, which used the new setting disabling the WITH clause:
select
"attr1" as "memberuniquename"
, min("min_attr2") as "member"
from (
select
attr1 as "attr1"
, min(attr2) as "min_attr2"
from
(
select
*
from
t1
where
attr3 in (
select
distinct attr3
from
t2
where
attr1 = 1
)
) a
group by
attr1
having
1 = 1
) b
group by
"attr1"
;
Whereas in the previous example an innocent "WHERE 1 = 1" predicate was added and simply
ignored by the optimizer, this time a seemingly innocent
"HAVING 1 = 1" was added, and this identified another "weak spot" in the optimizer code, because now the execution plan generated was again the HASH JOIN with the huge intermediate result.
In fact, when changing the
simplified query by adding the redundant HAVING clause like this:
select
attr1 as "memberuniquename"
, min(attr2) as "member"
from
t1
where
t1.attr3 in (select t2.attr3 from t2 where attr1 = 1)
group by
attr1
having
1 = 1
;
The bad execution plan can be reproduced, too. Even 11.2.0.3
doesn't fix this, so in this case one possible tactical workaround was to use the
NO_UNNEST hint for the subquery because we knew the data and the number of distinct keys used as input/output to the subquery were always only a few, so we could benefit from
filter subquery caching a lot. Notice that changing in the above query the
MIN() aggregate to a
COUNT() for example will change the join to a
semi join, avoiding the problem of the huge set of intermediate duplicated data.
The strategic solution in this case of course was to use a
correct data model that prevented the
duplicates in the
"security" table in first place. This prevented the huge intermediate result set even with when the "good" query transformations were not applied.
Footnote
Watch out if you
upgrade to
11.2.0.3. Potentially a lot of execution plans using the
WITH clause might change, and as you know, not always for the better.