Prompted by a (not really that) recent
discussion on the OTN forum I've decided to publish this note.
Sometimes you have the task of comparing column values and handling the NULL value cases correctly makes this rather cumbersome for columns that are allowed to be NULL.
The "official" SQL way of comparing two column values and to find out whether they are equal or not - under the assumption that having NULL in both columns should be treated as equal (a point that can be argued) would read in SQL something like the following for the "unequal" case:
column1 != column2 or (column1 is null and column2 is not null) or (column1 is not null and column2 is null)
and
column1 = column2 or (column1 is null and column2 is null)
for the "equal" case.
Imagine this to be done for a lot of columns and it becomes obvious that people might be looking for more concise alternatives how to express this.
Sometimes you might have come across code that uses NVL or a similar function with some "special" value that is not supposed to be part of the column data to be compared - like -1 for (positive) integer values for example, but this has the inherent risk of producing wrong results if the "special" value is (or might become in future) not that special after all.
The not really documented SYS_OP_MAP_NONNULL function could come handy if it was official, because it covers the NULL cases automatically.
Note: I still would call it undocumented even if it shows up in some "Materialized View" related part of the official 12c documentation, because it is not part of the official SQL language manual
SYS_OP_MAP_NONNULL(column1) != SYS_OP_MAP_NONNULL(column2)
is all that is needed. More concise as above, but undocumented.
And: There is one obvious drawback to SYS_OP_MAP_NONNULL, since it adds one byte to the input value to ensure the mapped NULL value differs from any other value: It doesn't work properly with maxed out columns, for example this:
SQL> create table t
2 as
3 select rpad('x', 4000) as col1, rpad('x', 4000) as col2 from dual
4 union all
5 select rpad('x', 4000) as col1, rpad('y', 4000) as col2 from dual
6 union all
7 select null as col1, rpad('y', 4000) as col2 from dual
8 union all
9 select rpad('x', 4000) as col1, null from dual
10 union all
11 select null as col1, null as col2 from dual
12 ;
Table created.
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't')
PL/SQL procedure successfully completed.
SQL>
SQL> select count(*) from t where sys_op_map_nonnull(col1) = sys_op_map_nonnull(col2);
select count(*) from t where sys_op_map_nonnull(col1) = sys_op_map_nonnull(col2)
*
ERROR at line 1:
ORA-01706: user function result value was too large
errors out with "ORA-01706: user function result value was too large".
DECODE
Decode is an exception to the rule how Oracle treats NULLs in SQL - it returns the "equal" value if both operands to compare are NULL, so it allows the following construct to achieve above comparison (see above OTN forum discussion for credits where this expression appeared first, more recently
here on Stew Ashton's blog):
For checking for inequality:
decode(column1, column2, 0, 1) = 1
For equality:
decode(column1, column2, 0, 1) = 0
Very concise, and officially documented.
Interestingly, starting with 11.2.0.2 there is an implicit re-write by default enabled that rewrites above expression (and only this variation, not the one testing for inequality):
decode(column1, column2, 0, 1) = 0
into
sys_op_map_nonnull(column1) = sys_op_map_nonnull(column2)
This is controlled via
fix control 8551880 ("Rewrite decode predicate to join").
Guess what happens when running the following query in 11.2.0.2 or later against above data set (the fix control is enabled by default in these versions):
SQL> select /*+ opt_param('_fix_control', '8551880:0') */ count(*) from t where decode(col1, col2, 0, 1) = 0;
COUNT(*)
----------
2
SQL>
SQL> select /*+ opt_param('_fix_control', '8551880:1') */ count(*) from t where decode(col1, col2, 0, 1) = 0;
select /*+ opt_param('_fix_control', '8551880:1') */ count(*) from t where decode(col1, col2, 0, 1) = 0
*
ERROR at line 1:
ORA-01706: user function result value was too large
Finally, what about performance? For that purpose I created the following compressed 50M rows table, covering different cases:
create table t1 compress
as
select case mod(rownum, 5) + 1
when 1 then cast('C1' as varchar2(20))
when 2 then cast(null as varchar2(20))
when 3 then cast('C3' as varchar2(20))
when 4 then cast(null as varchar2(20))
when 5 then cast('C5' as varchar2(20))
end as col1,
case mod(rownum, 5) + 1
when 1 then cast('C2' as varchar2(20))
when 2 then cast(null as varchar2(20))
when 3 then cast('C3' as varchar2(20))
when 4 then cast('C4' as varchar2(20))
when 5 then cast(null as varchar2(20))
end as col2
from
(select /*+ cardinality(1e4) */ null from dual connect by level <= 1e4)
, (select /*+ cardinality(5e3) */ null from dual connect by level <= 5e3)
;
and got the following results from 11.2.0.4 (ignoring the different cardinality estimates for the moment, since I'm only interested in the elapsed time here):
SQL> select count(*) from t1;
Elapsed: 00:00:00.94
Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18743 (2)| 00:03:45 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| T1 | 50M| 18743 (2)| 00:03:45 |
-------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> select count(*) from t1 where decode(col1, col2, 0, 1) = 1;
Elapsed: 00:00:02.55
Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 19176 (4)| 00:03:51 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 500K| 1953K| 19176 (4)| 00:03:51 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(DECODE("COL1","COL2",0,1)=1)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> select count(*) from t1 where SYS_OP_MAP_NONNULL("COL1")!=SYS_OP_MAP_NONNULL("COL2");
Elapsed: 00:00:03.04
Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 19515 (6)| 00:03:55 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 15M| 60M| 19515 (6)| 00:03:55 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_MAP_NONNULL("COL1")<>SYS_OP_MAP_NONNULL("COL2"))
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> select count(*) from t1 where (col1 != col2 or (col1 is null and col2 is not null) or (col1 is not null and col2 is null));
Elapsed: 00:00:03.51
Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 18876 (2)| 00:03:47 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 30M| 115M| 18876 (2)| 00:03:47 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL1" IS NULL AND "COL2" IS NOT NULL OR "COL2" IS NULL
AND "COL1" IS NOT NULL OR "COL1"<>"COL2")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> select /*+ opt_param('_fix_control', '8551880:0') */ count(*) from t1 where decode(col1, col2, 0, 1) = 0;
Elapsed: 00:00:02.49
Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 19176 (4)| 00:03:51 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 500K| 1953K| 19176 (4)| 00:03:51 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(DECODE("COL1","COL2",0,1)=0)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> select /*+ opt_param('_fix_control', '8551880:1') */ count(*) from t1 where decode(col1, col2, 0, 1) = 0;
Elapsed: 00:00:03.04
Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 19515 (6)| 00:03:55 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 6000K| 22M| 19515 (6)| 00:03:55 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_MAP_NONNULL("COL1")=SYS_OP_MAP_NONNULL("COL2"))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> select count(*) from t1 where col1 = col2 or (col1 is null and col2 is null);
Elapsed: 00:00:02.30
Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 18865 (2)| 00:03:47 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T1 | 13M| 49M| 18865 (2)| 00:03:47 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL1" IS NULL AND "COL2" IS NULL OR "COL1"="COL2")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
68456 consistent gets
0 physical reads
0 redo size
346 bytes sent via SQL*Net to client
364 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Surprisingly, in my tests SYS_OP_MAP_NONNULL always performed worst, and the DECODE expression pretty close to the more verbose SQL expression - depending on the data set sometimes even faster.
So using SYS_OP_MAP_NONNULL, and in particular the implicit rewrite might not be such a good idea after all. And now you know how the automatic rewrite could be disabled if desired.
Wouldn't the "official" verbose way be better in the presence of bitmap indexes on the columns involved?
ReplyDeleteNever mind, after some more thought I don't think there is a good way to use an index for col1=col2 (or col1!=col2).
ReplyDelete