Here is an odd bug that can lead to some nasty side effects when using the
EXCHANGE PARTITION technique. It is probably there for a very long time, simply because it depends on the usage of
virtual columns, and the basic technique of virtual columns was introduced way back in the
Oracle 8i times with the introduction of
Function Based Indexes.
The problem isn't the exchange partition operation itself, but the accompanying swap of
object statistics information, in particular the
column statistics.
Look the following sequence of DDL and DML commands and pay then special attention to the output for the column statistics
before and
after the EXCHANGE PARTITION operation:
-- EXCHANGE_PARTITION_VIRTUAL_COLS.SQL
--
-- Simple example that demonstrates that differences in Virtual Columns
-- between two tables do not prevent an EXCHANGE PARTITION operation
--
-- But Column Statistics are screwed up after the EXCHANGE PARTITION operation
-- if you have inconsistencies in the INTERNAL_COLUMN_IDs caused by the Virtual Columns
--
-- Either EXCHANGE PARTITION should fail or handle this correctly
--
-- Reproduced: 10.2.0.4, 10.2.0.5, 11.2.0.1, 11.2.0.2, 11.2.0.3
set echo on linesize 400
column column_name format a30
drop table t_x;
purge table t_x;
drop table t;
purge table t;
-- The partitioned table
create table t
partition by range (id)
(
partition p_default values less than (maxvalue)
)
as
select
rownum as id
, 'DESC' || rownum as attr000
, rpad('x', 100) as some_char
from
dual
connect by
level <= 10000
;
-- Add a virtual column here
alter table t add (virt_col as (attr000 || 'BLA'));
-- Note that the same problem will show up with other methods that generate a virtual column
--
-- So one of the below statements instead of above explicit virtual column definition will show a similar problem in column statistics
--
-- exec dbms_stats.gather_table_stats(null, 'T', method_opt => 'for columns (attr000 || ''BLA'') size 1')
-- create index t_idx on t (attr000 || 'BLA');
-- Add two non-virtual columns afterwards
alter table t add (non_virt_col1 varchar2(100), non_virt_col2 varchar2(100));
-- Some data
update t set non_virt_col1 = 'y', non_virt_col2 = 'z';
commit;
-- The non-partitioned table
create table t_x
as
select
rownum as id
, 'DESC' || rownum as attr000
, rpad('x', 100) as some_char
, cast('y' as varchar2(100)) as non_virt_col1
-- Data is different from T for this column
, cast('a' as varchar2(100)) as non_virt_col2
from
dual
connect by
level <= 10000
;
exec dbms_stats.gather_table_stats(null, 't')
exec dbms_stats.gather_table_stats(null, 't_x')
-- Correct Column Statistics here for both objects
select
b.internal_column_id
, a.column_name
, a.num_distinct
, a.low_value
, a.high_value
, a.density
from
user_part_col_statistics a
, user_tab_cols b
where
a.table_name = 'T'
and b.table_name = 'T'
and a.partition_name = 'P_DEFAULT'
and a.table_name = b.table_name
and a.column_name = b.column_name
order by
b.internal_column_id;
select
b.internal_column_id
, a.column_name
, a.num_distinct
, a.low_value
, a.high_value
, a.density
from
user_tab_col_statistics a
, user_tab_cols b
where
a.table_name = 'T_X'
and b.table_name = 'T_X'
and a.table_name = b.table_name
and a.column_name = b.column_name
order by
b.internal_column_id;
-- Exchange doesn't complain since on physical level the tables are identical
-- But logically the INTERNAL COLUMN_IDs are different
-- Hence the Statistics on partition level of T and global level of T_X are now screwed up for all columns that have different INTERNAL COLUMN_IDs in the two tables
alter table t exchange partition p_default with table t_x including indexes without validation;
-- Bad Column Statistics here, see for example T.VIRT_COL or T_X.NON_VIRT_COL1
select
b.internal_column_id
, a.column_name
, a.num_distinct
, a.low_value
, a.high_value
, a.density
from
user_part_col_statistics a
, user_tab_cols b
where
a.table_name = 'T'
and b.table_name = 'T'
and a.partition_name = 'P_DEFAULT'
and a.table_name = b.table_name
and a.column_name = b.column_name
order by
b.internal_column_id;
select
b.internal_column_id
, a.column_name
, a.num_distinct
, a.low_value
, a.high_value
, a.density
from
user_tab_col_statistics a
, user_tab_cols b
where
a.table_name = 'T_X'
and b.table_name = 'T_X'
and a.table_name = b.table_name
and a.column_name = b.column_name
order by
b.internal_column_id;
column column_name clear
The relevant
Column Statistics for T before and after the exchange partition operation:
INTERNAL_COLUMN_ID COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY
------------------ ------------------------------ ------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ----------
1 ID 10000 C102 C302 .0001
2 ATTR000 10000 4445534331 4445534339393939 .0001
3 SOME_CHAR 1 7820202020202020202020202020202020202020202020202020202020202020 7820202020202020202020202020202020202020202020202020202020202020 1
4 VIRT_COL 10000 444553433130303030424C41 4445534339424C41 .0001
5 NON_VIRT_COL1 1 79 79 1
6 NON_VIRT_COL2 1 7A 7A 1
INTERNAL_COLUMN_ID COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY
------------------ ------------------------------ ------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ----------
1 ID 10000 C102 C302 .0001
2 ATTR000 10000 4445534331 4445534339393939 .0001
3 SOME_CHAR 1 7820202020202020202020202020202020202020202020202020202020202020 7820202020202020202020202020202020202020202020202020202020202020 1
4 VIRT_COL 1 79 79 1
5 NON_VIRT_COL1 1 61 61 1
6 NON_VIRT_COL2
and for T_X:
INTERNAL_COLUMN_ID COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY
------------------ ------------------------------ ------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ----------
1 ID 10000 C102 C302 .0001
2 ATTR000 10000 4445534331 4445534339393939 .0001
3 SOME_CHAR 1 7820202020202020202020202020202020202020202020202020202020202020 7820202020202020202020202020202020202020202020202020202020202020 1
4 NON_VIRT_COL1 1 79 79 1
5 NON_VIRT_COL2 1 61 61 1
INTERNAL_COLUMN_ID COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY
------------------ ------------------------------ ------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ----------
1 ID 10000 C102 C302 .0001
2 ATTR000 10000 4445534331 4445534339393939 .0001
3 SOME_CHAR 1 7820202020202020202020202020202020202020202020202020202020202020 7820202020202020202020202020202020202020202020202020202020202020 1
4 NON_VIRT_COL1 10000 444553433130303030424C41 4445534339424C41 .0001
5 NON_VIRT_COL2 1 79 79 1
Not only this can lead to situations where columns simply end up with
wrong (shifted) or no column statistics at all, but it is even possible to have columns with statistics that are simply
illegal because the internal representation of the
low / high values for example isn't compatible with the actual
column data type.
If you think about how that information is stored in the data dictionary (along the so called
INTERNAL column id, there is also a plain COLUMN_ID for display purposes and a SEGMENT column id that defines how the columns are stored in the blocks) and how Oracle very likely performs the swap of object statistic when exchanging partitions the potential problem becomes obvious.
It is however very
unlikely to hit that bug for several reasons:
- There are several reasons why you usually want to gather the statistics on the exchanged partition
after the exchange took place. If you want to make use of the
SIZE AUTO functionality that automatically determines the generation of
column histograms based on column usage, or want to ensure that
higher level statistics are maintained via
aggregation, or if you want to benefit from more recent features like
Incremental Partition Statistics, for example.
If you follow this recommended technique you won't have any problems on the partitioned object since the statistics gathered after the exchange operation
won't be affected by the bug
However, depending on what you do with the
swapped table segment, you might still encounter problems since, without refreshing the statistics on that table you'll potentially end up with
screwed up column statistics.
- Until the arrival of more recent features like
Virtual Columns and
Extended Statistics that under the covers are based on the same technique it is very likely that both objects involved in the exchange partition operation are carefully maintained
synchronously with regard to structural changes like column or index additions, since otherwise the EXCHANGE PARTITION operation might
fail with corresponding errors
From 11g on however you can far simpler fall into the trap of getting out-of-sync internal column ids by simply
gathering Extended Statistics on or by adding a Virtual Column to only one of the two objects. The EXCHANGE PARTITION operation won't fail, but if the Virtual Columns were added in a different order or real columns were added afterwards (to both objects) the column statistics of the affected columns will be screwed up.
Note that the
official documentation is not exactly helpful, since it explicitly says
here that the Virtual Columns
don't need to be in sync between the two objects because only the real columns are relevant for the EXCHANGE PARTITION operation itself.
Furthermore from a pure technical point of view it is a
legal approach to gather statistics on the table segment to be exchanged into the partitioned object and rely on EXCHANGE PARTITION to swap properly the statistics between the two segments, which in this case only affects the partition level statistics of the particular partition being exchanged (unless you play with the undocumented parameter "_minimal_stats_aggregation") and the table level statistics of the unpartitioned object.
If you follow this practice and do some of the following:
- Add Function Based Indexes to only one of the objects or both objects in different order, or mixed with additions of other, real columns
- Add Extended Statistics or Virtual Columns only to one of the objects and afterwards add real columns
then the resulting column statistics will be screwed up for all columns that don't have a matching INTERNAL column id.
Oracle obviously simply just swaps the two
object_ids in the underlying statistics dictionary tables, which is the most obvious and efficient operation, but unfortunately produces wrong column statistics results if the
INTERNAL column ids don't match.
The operation would probably be much more complex if Oracle tried to handle this situation correctly.
The possible
workarounds are simple:
- Either
prevent such a situation upfront, by ensuring that all operations including those that generate Virtual Columns are always done in sync to the two objects involved
-
Gather statistics on
both objects after the exchange partition operation took place
The client has opened an SR; it will be interesting to hear what Oracle says.