Sunday, June 10, 2012

Report Generators And Query Transformations

Usually the Cost-Based Optimizer arrives at a reasonable execution plan if it gets the estimates regarding cardinality and data scattering / clustering right (if you want to learn more about that why not watch my Webinar available at "AllThingsOracle.com"?).

Here is an example I've recently come across where this wasn't case - the optimizer obviously preferred plans with a significantly higher cost.

The setup to reproduce the issue is simple:

create table t1 as select rownum as id , mod(rownum, 100) + 1 as attr1 , 'DESC' || to_char(mod(rownum, 100) + 1, 'TM') as attr2 , mod(rownum, 10) + 1 as attr3 , rpad('x', 100) as filler from dual connect by level <= 1000000 ; exec dbms_stats.gather_table_stats(null, 't1') create table t2 as select rownum as id , mod(rownum, 2) + 1 as attr1 , 'DESC' || to_char(mod(rownum, 2) + 1, 'TM') as attr2 , mod(rownum, 10) + 1 as attr3 , rpad('x', 100) as filler from dual connect by level <= 10000 ; exec dbms_stats.gather_table_stats(null, 't2', no_invalidate => false)

So we have two tables, and the smaller one of the tables basically was used as a "security" information to restrict the data from the larger table according to some user entitlement (apologies for the generic column and table names that don't make this more obvious).

The query that I'm looking at is the following:

with a as ( select * from t1 where attr3 in (select distinct attr3 from t2 where attr1 = 1) ), b as ( select attr1 as "attr1" , min(attr2) as "min_attr2" from a group by attr1 ) select "attr1" as "memberuniquename" , min("min_attr2") as "member" from b where 1 = 1 group by "attr1" ;

At first sight you might ask yourself, why should anyone write a query like that, in particular the duplicated aggregation step? The answer is of course, a human being very likely wouldn't write such a query, but this is what you get from some report generators based on generic meta data.

In principle the query is looking for a simple list of values from a table (in this case dimension tables that are browsed interactively), but restricted according to the "profile" of a particular user, here represented by the unique list of values of T2.ATTR3 restricted on T2.ATTR1 = 1.

There are a couple of design issues with the reproduced approach, but the query doesn't look too complex after all.

One specific issue of the query is that the MIN() aggregate allows Oracle a certain kind of flexibility when to apply the aggregate function: It could be applied on duplicated data and still generate the correct answer, something that is not necessarily possible with other aggregate functions like COUNT() for example.

One potential threat therefore comes from the fact that a query transformation of the IN clause to a kind of join can produce a huge intermediate result set since ATTR3 in both tables only has ten distinct values. Usually this is not a problem since the optimizer, provided it gets the estimates right, should recognize this pattern and calculate a corresponding increased cost for such plans, and therefore prefer other execution plans that avoid this in some way, for example by using a DISTINCT / GROUP BY operation early before a join, or simply using a semi join operation.

In fact, the query could be simplified to the following:

select attr1 as "memberuniquename" , min(attr2) as "member" from t1 where t1.attr3 in (select t2.attr3 from t2 where attr1 = 1) group by attr1 ;


The database version in question was 11.2.0.1, and if you run the simplified query, then you'll get the following execution plan:

Plan hash value: 1591942764 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 4500 | 4858 (3)| 00:00:59 | | 1 | HASH GROUP BY | | 100 | 4500 | 4858 (3)| 00:00:59 | |* 2 | HASH JOIN | | 708 | 31860 | 4857 (3)| 00:00:59 | | 3 | VIEW | VW_GBF_6 | 10 | 30 | 50 (4)| 00:00:01 | | 4 | HASH GROUP BY | | 10 | 60 | 50 (4)| 00:00:01 | |* 5 | TABLE ACCESS FULL| T2 | 5000 | 30000 | 48 (0)| 00:00:01 | | 6 | VIEW | VW_GBC_5 | 708 | 29736 | 4807 (3)| 00:00:58 | | 7 | HASH GROUP BY | | 708 | 9204 | 4807 (3)| 00:00:58 | | 8 | TABLE ACCESS FULL| T1 | 1000K| 12M| 4707 (1)| 00:00:57 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ITEM_1"="ITEM_1") 5 - filter("ATTR1"=1)


The execution plan is reasonable and performs reasonable - the optimizer used the so called "Group By Placement" transformation which took one of the predicted approaches to apply the DISTINCT (transformed to a GROUP BY) as early as possible and arrived at the following query after transformation and costing, taken from the optimizer trace file and slightly re-formatted for better readability:

SELECT "VW_GBC_5"."ITEM_3" "ATTR1" , MIN("VW_GBC_5"."ITEM_2") "MIN(ATTR2)" FROM ( SELECT "T2"."ATTR3" "ITEM_1" FROM "T2" "T2" WHERE "T2"."ATTR1"=1 GROUP BY "T2"."ATTR3" ) "VW_GBF_6" , ( SELECT "T1"."ATTR3" "ITEM_1" , MIN("T1"."ATTR2") "ITEM_2" , "T1"."ATTR1" "ITEM_3" FROM "T1" "T1" GROUP BY "T1"."ATTR3" , "T1"."ATTR1" ) "VW_GBC_5" WHERE "VW_GBC_5"."ITEM_1"="VW_GBF_6"."ITEM_1" GROUP BY "VW_GBC_5"."ITEM_3" ;


which is funnily a bit similar to the original query initially mentioned.

If you however check the execution plan of the original query, you'll get the following:

Plan hash value: 3221309153 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 100 | 3900 | 81386 (95)| 00:16:17 | | 1 | HASH GROUP BY | | 100 | 3900 | 81386 (95)| 00:16:17 | | 2 | VIEW | | 100 | 3900 | 81386 (95)| 00:16:17 | | 3 | HASH GROUP BY | | 100 | 1900 | 81386 (95)| 00:16:17 | |* 4 | HASH JOIN | | 500M| 9059M| 10222 (54)| 00:02:03 | |* 5 | TABLE ACCESS FULL| T2 | 5000 | 30000 | 48 (0)| 00:00:01 | | 6 | TABLE ACCESS FULL| T1 | 1000K| 12M| 4707 (1)| 00:00:57 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("ATTR3"="ATTR3") 5 - filter("ATTR1"=1)


This execution plan looks quite different, and it shows exactly the problem I've described - the optimizer has the option to postpone the aggregation after the join due to the specifics of the query: A MIN() aggregate can be applied late, but the problem can be seen from the cardinality estimate of the join - a whopping 500 million rows, that are crunched afterwards into 100 rows. This doesn't look like an efficient approach, and indeed at actual execution time a lot of CPU time (several minutes, depending on the CPU speed) gets used for handling that huge amount of intermediate data. It is probably a bug that it doesn't use at least a simple semi join to avoid the duplicates.

When I first saw this problem I was pretty sure that this is a typical problem of the impact of the WITH clause on query transformations in versions prior to 11.2.0.3 where that problem finally was addressed (to a very large extend, although not completely eliminated).

And indeed, when rewriting the original query to the corresponding one using inline views instead of Subquery Factoring / Common Table Expression:

select "attr1" as "memberuniquename" , min("min_attr2") as "member" from ( select attr1 as "attr1" , min(attr2) as "min_attr2" from ( select * from t1 where attr3 in ( select distinct attr3 from t2 where attr1 = 1 ) ) a group by attr1 ) b where 1 = 1 group by "attr1" ;


the following execution plan will be generated:

Plan hash value: 1699732834 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 100 | 3900 | 4858 (3)| 00:00:59 | | 1 | HASH GROUP BY | | 100 | 3900 | 4858 (3)| 00:00:59 | | 2 | VIEW | | 100 | 3900 | 4858 (3)| 00:00:59 | | 3 | HASH GROUP BY | | 100 | 4500 | 4858 (3)| 00:00:59 | |* 4 | HASH JOIN | | 708 | 31860 | 4857 (3)| 00:00:59 | | 5 | VIEW | VW_GBF_8 | 10 | 30 | 50 (4)| 00:00:01 | | 6 | HASH GROUP BY | | 10 | 60 | 50 (4)| 00:00:01 | |* 7 | TABLE ACCESS FULL| T2 | 5000 | 30000 | 48 (0)| 00:00:01 | | 8 | VIEW | VW_GBC_7 | 708 | 29736 | 4807 (3)| 00:00:58 | | 9 | HASH GROUP BY | | 708 | 9204 | 4807 (3)| 00:00:58 | | 10 | TABLE ACCESS FULL| T1 | 1000K| 12M| 4707 (1)| 00:00:57 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("ITEM_1"="ITEM_1") 7 - filter("ATTR1"=1)


which is almost identical to the execution plan of the simplified query. Notice also the huge difference in cost compared to the execution plan for the original query. A test on 11.2.0.3 showed the same execution plan without the re-write to inline views, so the changes introduced in 11.2.0.3 address this issue.

I've advised therefore to disable the usage of the WITH clause in the report generator, which was a matter of a simple click in the tool.

More Query Variations

To that point I thought this could be filed as just another example of the already known Subquery Factoring / Query Transformations issues, but it's not over yet - the report generator managed to produce more variations of the query. I liked in particular this one, which used the new setting disabling the WITH clause:

select "attr1" as "memberuniquename" , min("min_attr2") as "member" from ( select attr1 as "attr1" , min(attr2) as "min_attr2" from ( select * from t1 where attr3 in ( select distinct attr3 from t2 where attr1 = 1 ) ) a group by attr1 having 1 = 1 ) b group by "attr1" ;


Whereas in the previous example an innocent "WHERE 1 = 1" predicate was added and simply ignored by the optimizer, this time a seemingly innocent "HAVING 1 = 1" was added, and this identified another "weak spot" in the optimizer code, because now the execution plan generated was again the HASH JOIN with the huge intermediate result.

In fact, when changing the simplified query by adding the redundant HAVING clause like this:

select attr1 as "memberuniquename" , min(attr2) as "member" from t1 where t1.attr3 in (select t2.attr3 from t2 where attr1 = 1) group by attr1 having 1 = 1 ;


The bad execution plan can be reproduced, too. Even 11.2.0.3 doesn't fix this, so in this case one possible tactical workaround was to use the NO_UNNEST hint for the subquery because we knew the data and the number of distinct keys used as input/output to the subquery were always only a few, so we could benefit from filter subquery caching a lot. Notice that changing in the above query the MIN() aggregate to a COUNT() for example will change the join to a semi join, avoiding the problem of the huge set of intermediate duplicated data.

The strategic solution in this case of course was to use a correct data model that prevented the duplicates in the "security" table in first place. This prevented the huge intermediate result set even with when the "good" query transformations were not applied.

Footnote

Watch out if you upgrade to 11.2.0.3. Potentially a lot of execution plans using the WITH clause might change, and as you know, not always for the better.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.