Monday, August 4, 2008

Exchange partition and the SIZE AUTO option of DBMS_STATS column statistics in 10g

Oracle's exchange partition feature is a handy method for loading mass data. Data is usually being loaded into an "exchange" table that mirrors the attributes of a single partition or subpartition of the actual table.

Before 10g it probably didn't matter if you gathered statistics on the exchange table before performing the exchange or afterwards on the actual table, because the "SIZE AUTO" option of the method_opt parameter that was already in 9i was rarely used.

But starting with Oracle 10g the "SIZE AUTO" option is used as default when gathering table resp. column statistics. "SIZE AUTO" works in a two-level approach: In the first step it uses "column usage monitoring", another feature introduced in Oracle 9i to gather the information which columns potentially could benefit from histograms as they are used in WHERE clauses with certain kind of predicates, e.g. equal comparisons. In the next step a histogram is generated to detect if the column values are skewed or have significant gaps (which raised an interesting question about what are "gaps" in VARCHAR based data), and if they do, the histogram is stored in the statistics (this part corresponds to the "SIZE SKEWONLY" option behaviour).

So if you want to take advantage of new default "SIZE AUTO" feature, it is crucial that the statistics are gathered on the actual table because otherwise the "column usage monitoring" won't work as expected. It needs the workload of the actual table, not the one of the exchange table.

As already mentioned, from 10g on the SIZE AUTO option is the default option if you haven't changed it using DBMS_STATS.SET_PARAM resp. DBMS_STATS.SET_*_PREFS in 11g.

This small script shall demonstrate the issue:


SQL>
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for 32-bit Windows: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL>
SQL> drop table size_auto_test purge;

Table dropped.

SQL>
SQL> create table size_auto_test (
2 x_pkey number not null,
3 x_slice varchar2(20) not null,
4 data1 number
5 )
6 partition by range (x_pkey)
7 subpartition by list (x_slice)
8 (
9 partition pkey_1 values less than (2)
10 (
11 subpartition pkey_1_001 values ('001'),
12 subpartition pkey_1_101 values ('101')
13 )
14 );

Table created.

SQL>
SQL> -- "data1" is skewed
SQL> insert into size_auto_test (x_pkey, x_slice, data1)
2 select 1, '001', trunc(sqrt(seq-1)) as data1
3 from (
4 select level as seq from dual connect by level <= 10
5 );

10 rows created.

SQL>
SQL> -- "data1" is skewed
SQL> insert into size_auto_test (x_pkey, x_slice, data1)
2 select 1, '101', trunc(sqrt(seq-1)) as data1
3 from (
4 select level as seq from dual connect by level <= 1000
5 );

1000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> -- Gather initial statistics
SQL> begin dbms_stats.gather_table_stats(
2 ownname=>USER,
3 tabname=>'SIZE_AUTO_TEST',
4 granularity=>'ALL',
5 method_opt=>'FOR ALL COLUMNS SIZE AUTO'); end;
6 /

PL/SQL procedure successfully completed.

SQL>
SQL> -- Check the column usage monitoring
SQL> select * from sys.col_usage$ where obj# in (
2 select object_id from dba_objects
3 where object_name = 'SIZE_AUTO_TEST'
4 --and subobject_name = 'PKEY_1_101'
5 );

no rows selected

SQL>
SQL> drop table exchange_test purge;

Table dropped.

SQL>
SQL> create table exchange_test (
2 x_pkey number not null,
3 x_slice varchar2(20) not null,
4 data1 number
5 );

Table created.

SQL>
SQL> -- mirror one subpartition in exchange table
SQL> insert into exchange_test
2 select * from size_auto_test SUBPARTITION ("PKEY_1_101");

1000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> begin dbms_stats.gather_table_stats(
2 ownname=>USER,
3 tabname=>'EXCHANGE_TEST',
4 method_opt=>'FOR ALL COLUMNS SIZE AUTO'); end;
5 /

PL/SQL procedure successfully completed.

SQL>
SQL> -- So this is our current situation
SQL> select table_name, column_name, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name = 'EXCHANGE_TEST';

TABLE_NAME COLUMN_NAME NUM_BUCKETS HISTOGRAM
--------------- --------------- ----------- ---------------
EXCHANGE_TEST X_PKEY 1 NONE
EXCHANGE_TEST X_SLICE 1 NONE
EXCHANGE_TEST DATA1 1 NONE

SQL>
SQL> select table_name, subpartition_name, column_name, num_buckets, histogram
2 from user_subpart_col_statistics
3 where table_name = 'SIZE_AUTO_TEST';

TABLE_NAME SUBPARTITION_NA COLUMN_NAME NUM_BUCKETS HISTOGRAM
--------------- --------------- --------------- ----------- ---------------
SIZE_AUTO_TEST PKEY_1_001 DATA1 1 NONE
SIZE_AUTO_TEST PKEY_1_001 X_SLICE 1 NONE
SIZE_AUTO_TEST PKEY_1_001 X_PKEY 1 NONE
SIZE_AUTO_TEST PKEY_1_101 DATA1 1 NONE
SIZE_AUTO_TEST PKEY_1_101 X_SLICE 1 NONE
SIZE_AUTO_TEST PKEY_1_101 X_PKEY 1 NONE

6 rows selected.

SQL>
SQL> -- Now perform an exchange
SQL> alter table size_auto_test exchange subpartition pkey_1_101 with table exchange_test;

Table altered.

SQL>
SQL> -- Check the statistics
SQL> select subpartition_name, column_name, num_buckets, histogram
2 from user_subpart_col_statistics
3 where table_name = 'SIZE_AUTO_TEST';

SUBPARTITION_NA COLUMN_NAME NUM_BUCKETS HISTOGRAM
--------------- --------------- ----------- ---------------
PKEY_1_001 DATA1 1 NONE
PKEY_1_001 X_SLICE 1 NONE
PKEY_1_001 X_PKEY 1 NONE
PKEY_1_101 DATA1 1 NONE
PKEY_1_101 X_SLICE 1 NONE
PKEY_1_101 X_PKEY 1 NONE

6 rows selected.

SQL>
SQL> -- Undo the exchange
SQL> alter table size_auto_test exchange subpartition pkey_1_101 with table exchange_test;

Table altered.

SQL>
SQL> -- Perform some queries that could benefit from histograms
SQL> select count(*) from size_auto_test subpartition (pkey_1_101)
2 where data1 = 0;

COUNT(*)
----------
1

SQL>
SQL> select count(*) from size_auto_test subpartition (pkey_1_101)
2 where data1 = 11;

COUNT(*)
----------
23

SQL>
SQL> select count(*) from size_auto_test subpartition (pkey_1_101)
2 where data1 = 13;

COUNT(*)
----------
27

SQL>
SQL> select count(*) from size_auto_test subpartition (pkey_1_101)
2 where data1 = 20;

COUNT(*)
----------
41

SQL>
SQL> select count(*) from size_auto_test subpartition (pkey_1_101)
2 where data1 = 30;

COUNT(*)
----------
61

SQL>
SQL> select count(*) from size_auto_test subpartition (pkey_1_101)
2 where data1 = 31;

COUNT(*)
----------
39

SQL>
SQL> -- flush the monitoring info, so we can see it in col_usage$
SQL> -- otherwise it may take up to 15-20 minutes to see the latest changes
SQL> -- DBMS_STATS.GATHER_*_STATS will flush this as well
SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

PL/SQL procedure successfully completed.

SQL>
SQL> -- Check the column usage monitoring
SQL> select * from sys.col_usage$ where obj# in (
2 select object_id from dba_objects
3 where object_name = 'EXCHANGE_TEST'
4 );

no rows selected

SQL>
SQL> -- Check the column usage monitoring
SQL> select * from sys.col_usage$ where obj# in (
2 select object_id from dba_objects
3 where object_name = 'SIZE_AUTO_TEST'
4 --and subobject_name = 'PKEY_1_101'
5 );

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAM
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- --------
59457 3 1 0 0 0 0 0 04.08.08

SQL>
SQL> -- Obviously the exchange table does not have
SQL> -- any usage recorded
SQL> -- So gathering stats again does not make any difference
SQL> begin dbms_stats.gather_table_stats(
2 ownname=>USER,
3 tabname=>'EXCHANGE_TEST',
4 method_opt=>'FOR ALL COLUMNS SIZE AUTO'); end;
5 /

PL/SQL procedure successfully completed.

SQL>
SQL> -- Still the same
SQL> select table_name, column_name, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name = 'EXCHANGE_TEST';

TABLE_NAME COLUMN_NAME NUM_BUCKETS HISTOGRAM
--------------- --------------- ----------- ---------------
EXCHANGE_TEST X_PKEY 1 NONE
EXCHANGE_TEST X_SLICE 1 NONE
EXCHANGE_TEST DATA1 1 NONE

SQL>
SQL> -- So if you exchange now you won't benefit
SQL> -- from any histograms potentially being generated
SQL> -- due to column usage and skewness
SQL> alter table size_auto_test exchange subpartition pkey_1_101 with table exchange_test;

Table altered.

SQL>
SQL> -- But now gathering on the actual table
SQL> begin dbms_stats.gather_table_stats(
2 ownname=>USER,
3 tabname=>'SIZE_AUTO_TEST',
4 granularity=>'ALL',
5 method_opt=>'FOR ALL COLUMNS SIZE AUTO'); end;
6 /

PL/SQL procedure successfully completed.

SQL>
SQL> -- Histograms being generated on the skewed column
SQL> select table_name, subpartition_name, column_name, num_buckets, histogram
2 from user_subpart_col_statistics
3 where table_name = 'SIZE_AUTO_TEST';

TABLE_NAME SUBPARTITION_NA COLUMN_NAME NUM_BUCKETS HISTOGRAM
--------------- --------------- --------------- ----------- ---------------
SIZE_AUTO_TEST PKEY_1_001 DATA1 4 FREQUENCY
SIZE_AUTO_TEST PKEY_1_001 X_SLICE 1 NONE
SIZE_AUTO_TEST PKEY_1_001 X_PKEY 1 NONE
SIZE_AUTO_TEST PKEY_1_101 DATA1 32 FREQUENCY
SIZE_AUTO_TEST PKEY_1_101 X_SLICE 1 NONE
SIZE_AUTO_TEST PKEY_1_101 X_PKEY 1 NONE

6 rows selected.

SQL>
SQL> spool off


Note that the test case shows the same result on 11.1.0.6.

So it can be seen from this sample that gathering statistics on the exchange table uses the workload of the exchange table and therefore the SIZE AUTO option doesn't work as expected. Gathering statistics on the actual partitioned table makes use the column workload information and generates histograms based on both workload and skewness.

Another interesting observation is that the column usage in sys.col_usage$ is maintained on table level rather on partition or subpartition level. Although I've explicitly queried on subpartition level the monitoring info refers to the table object and not to the (sub-)partition subobjects. This means that if your data in the partitions is used differently by queries depending on the way you gather statistics you might end up with histograms on particular partitions that are unnecessary. This can have an impact on both the time it takes to gather the statistics as for each histogram a separate gather step resp. query is required and the execution plans and cardinality estimates can be influenced by the existing histograms.