Wednesday, January 19, 2011

Pending Statistics

This is just a quick heads-up to those that plan to use the Pending Statistics feature that has been introduced in Oracle 11.1.

It looks like that in all currently available versions that support this feature Pending Statistics have not been implemented consequently for all possible DBMS_STATS calls, so you have to be very careful which calls you use. Having enabled the pending statistics for a particular table you might start to manipulate the statistics under the impression that the modifications performed are not reflected in the actual dictionary statistics (by "dictionary statistics" in this case I don't mean the statistics of the data dictionary objects themselves but the actual statistics of database objects stored in the data dictionary) but only in the pending statistics area allowing you to test statistics modifications in an isolated environment using the OPTIMIZER_USE_PENDING_STATISTICS parameter on session level.

You therefore might be in for a surprise to find out that this holds true only for a limited set of DBMS_STATS calls, but not for all.

This effectively means that particular changes to the statistics will be effective immediately although pending statistics have been enabled.

In particular manipulations of the statistics using the SET_*_STATS procedures of DBMS_STATS seem to ignore the pending statistics settings and still update the dictionary statistics immediately without further notice.

This is rather unfortunate since this means that Pending Statistics can not be used in a straightforward way to test user-defined statistics which can be very helpful under certain circumstances but require extensive testing before using them on a live system.

But also other calls, like gathering statistics only for a particular set of columns show an unexpected behaviour: It looks like that both statistics get modified, the pending statistics area, but also the dictionary statistics.

Note that setting the GLOBAL preferences (DBMS_STATS.SET_GLOBAL_PREFS) for PUBLISH to FALSE seems to fix this particular issue - in that case only the pending statistics get updated, but the dictionary statistics are left unchanged. This fix does not apply to the SET_*_STATS procedures unfortunately, those seem to always update the dictionary statistics.

The worst thing however is that the statistics history that is automatically maintained since Oracle 10g does not reflect these (unintended) changes properly, so you can not easily recover from the potentially unwanted modifications by calling DBMS_STATS.RESTORE_TABLE_STATS.

Finally I was obviously able to activate the pending statistics using DBMS_STATS.RESTORE_TABLE_STATS - you can rather clearly see this behaviour when using the SET_GLOBAL_PREFS('PUBLISH', 'FALSE') variant of the following script.

The following is a small demonstration of the issues encountered - please note that it modifies the GLOBAL preferences for the PUBLISH setting if you intend to run this test by yourself.


set echo on timing on linesize 130 tab off trimspool on

alter session set nls_date_format = 'DD.MM.YYYY HH24:MI:SS';

drop table t purge;

create table t
as
select * from all_objects
where rownum <= 1000;

exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

-- Our baseline, no histograms, basic column statistics for all columns
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 1')

-- Verify the result
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Enable pending statistics for table T
-- You can try these different calls
--
-- The GATHER_*_STATS procedures seem to behave correctly
-- only when setting the GLOBAL PREFS to FALSE
--
-- "Correctly" means that the results are reflected in the
-- pending area only but not in the dictionary statistics
--
-- Note that the SET_*_STATS procedures seem to ignore the setting
-- always and publish directly to the dictionary
-- no matter what the PUBLISH setting is on any level (TABLE, GLOBAL)
--
-- exec dbms_stats.set_global_prefs('PUBLISH', 'FALSE')
exec dbms_stats.set_table_prefs(null, 'T', 'PUBLISH', 'FALSE')
-- exec dbms_stats.set_schema_prefs(user, 'PUBLISH', 'FALSE')

-- Verify the current setting, statistics will not be published
select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- This is supposed to go to the pending statistics area
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 254')

-- Yes, it worked, the dictionary statistics are not modified
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- The pending statistics area contains now the new statistics including histograms
select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- Let's gather statistics only for the OBJECT_NAME column
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 1')

-- Oops, why do my dictionary statistics reflect that change (Note the LAST_ANALYZED column)
-- Except for you set the GLOBAL preferences for PUBLISH to FALSE
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- I do have now the statistics updated in both, pending statistics and dictionary statistics
-- Except for you set the GLOBAL preferences for PUBLISH to FALSE
select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- Let's recreate the histogram only on the OBJECT_NAME column
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 254')

-- Oops, I did it again...
-- Except for you set the GLOBAL preferences for PUBLISH to FALSE
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- Let's define a manually crafted NDV and DENSITY value
-- Again I expect this to go to the pending statistics area
declare
srec dbms_stats.statrec;
novals dbms_stats.numarray;
distcnt number;
avgclen number;
nullcnt number;
density number;
begin
dbms_stats.get_column_stats(null, 't', 'object_name', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
dbms_stats.set_column_stats(
ownname=>null,
tabname=>'t',
colname=>'object_name',
distcnt=>distcnt*100,
nullcnt=>nullcnt,
srec=>srec,
avgclen=>avgclen,
density=>density/100
);
end;
/

-- Nope, no change here
select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- But I just changed it in the dictionary statistics
-- Even in case of setting the GLOBAL preference to FALSE
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- And what is even worse: The statistics history does not reflect all these changes to the dictionary statistics
select table_name, stats_update_time from USER_TAB_STATS_HISTORY where table_name = 'T';

exec dbms_stats.restore_table_stats(null, 'T', systimestamp)

-- But which statistics have been restored now?
-- It looks like this actually restored the PENDING statistics
-- according to the LAST_ANALYZED information??
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')


And this is what I get from running this on 11.1.0.7, 11.2.0.1 or 11.2.0.2:


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> alter session set nls_date_format = 'DD.MM.YYYY HH24:MI:SS';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> drop table t purge;

Table dropped.

Elapsed: 00:00:00.20
SQL>
SQL> create table t
2 as
3 select * from all_objects
4 where rownum <= 1000;

Table created.

Elapsed: 00:00:00.35
SQL>
SQL> exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

DBMS_STATS.GET_PREFS('PUBLISH',NULL,'T')
----------------------------------------------------------------------------------------------------------------------------------
TRUE

Elapsed: 00:00:00.00
SQL>
SQL> -- Our baseline, no histograms, basic column statistics for all columns
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> -- Verify the result
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .001005025 18.01.2011 18:58:24 1 NO

Elapsed: 00:00:00.01
SQL>
SQL> -- Enable pending statistics for table T
SQL> -- You can try these different calls
SQL> --
SQL> -- The GATHER_*_STATS procedures seem to behave correctly
SQL> -- only when setting the GLOBAL PREFS to FALSE
SQL> --
SQL> -- "Correctly" means that the results are reflected in the
SQL> -- pending area only but not in the dictionary statistics
SQL> --
SQL> -- Note that the SET_*_STATS procedures seem to ignore the setting
SQL> -- always and publish directly to the dictionary
SQL> -- no matter what the PUBLISH setting is on any level (TABLE, GLOBAL)
SQL> --
SQL> -- exec dbms_stats.set_global_prefs('PUBLISH', 'FALSE')
SQL> exec dbms_stats.set_table_prefs(null, 'T', 'PUBLISH', 'FALSE')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL> -- exec dbms_stats.set_schema_prefs(user, 'PUBLISH', 'FALSE')
SQL>
SQL> -- Verify the current setting, statistics will not be published
SQL> select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

DBMS_STATS.GET_PREFS('PUBLISH',NULL,'T')
----------------------------------------------------------------------------------------------------------------------------------
FALSE

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- This is supposed to go to the pending statistics area
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 254')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.99
SQL>
SQL> -- Yes, it worked, the dictionary statistics are not modified
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .001005025 18.01.2011 18:58:24 1 NO

Elapsed: 00:00:00.01
SQL>
SQL> -- The pending statistics area contains now the new statistics including histograms
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .00101 18.01.2011 18:58:26

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
2

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- Let's gather statistics only for the OBJECT_NAME column
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.14
SQL>
SQL> -- Oops, why do my dictionary statistics reflect that change (Note the LAST_ANALYZED column)
SQL> -- Except for you set the GLOBAL preferences for PUBLISH to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .001005025 18.01.2011 18:58:29 1 NO

Elapsed: 00:00:00.01
SQL>
SQL> -- I do have now the statistics updated in both, pending statistics and dictionary statistics
SQL> -- Except for you set the GLOBAL preferences for PUBLISH to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .001005025 18.01.2011 18:58:29

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
0

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
2

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- Let's recreate the histogram only on the OBJECT_NAME column
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 254')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.34
SQL>
SQL> -- Oops, I did it again...
SQL> -- Except for you set the GLOBAL preferences for PUBLISH to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .00101 18.01.2011 18:58:32 254 NO

Elapsed: 00:00:00.01
SQL>
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .00101 18.01.2011 18:58:32

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- Let's define a manually crafted NDV and DENSITY value
SQL> -- Again I expect this to go to the pending statistics area
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 begin
9 dbms_stats.get_column_stats(null, 't', 'object_name', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
10 dbms_stats.set_column_stats(
11 ownname=>null,
12 tabname=>'t',
13 colname=>'object_name',
14 distcnt=>distcnt*100,
15 nullcnt=>nullcnt,
16 srec=>srec,
17 avgclen=>avgclen,
18 density=>density/100
19 );
20 end;
21 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.17
SQL>
SQL> -- Nope, no change here
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .00101 18.01.2011 18:58:32

Elapsed: 00:00:00.01
SQL>
SQL> -- But I just changed it in the dictionary statistics
SQL> -- Even in case of setting the GLOBAL preference to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
99500 .0000101 18.01.2011 18:58:34 254 YES

Elapsed: 00:00:00.01
SQL>
SQL> -- And what is even worse: The statistics history does not reflect all these changes to the dictionary statistics
SQL> select table_name, stats_update_time from USER_TAB_STATS_HISTORY where table_name = 'T';

TABLE_NAME STATS_UPDATE_TIME
------------------------------ ---------------------------------------------------------------------------
T 18-JAN-11 06.58.24.391000 PM +01:00

Elapsed: 00:00:00.01
SQL>
SQL> exec dbms_stats.restore_table_stats(null, 'T', systimestamp)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.29
SQL>
SQL> -- But which statistics have been restored now?
SQL> -- It looks like this actually restored the PENDING statistics
SQL> -- according to the LAST_ANALYZED information??
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .00101 18.01.2011 18:58:32 254 NO

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>


It is also interesting to note that, although Oracle has added an array of new dictionary views that allow to access the pending statistics area, these views are inconsistent with the existing statistics-related views in terms of naming conventions, columns and behaviour.

For example the *_col_pending_stats view does not have a NUM_BUCKETS column, and the corresponding *_tab_histgrm_pending_stats view for histogram details shows 0 rows if basic column statistics have been defined but no histogram whereas the original *_tab_histograms returns two rows if basic column statistics have been collected representing the low and high value of the column.

Summary

The Pending Statistics feature clearly shows some unexpected behaviour. In particular you better don't rely on it preventing the dictionary statistics from being updated by DBMS_STATS calls with the PUBLISH attribute set to FALSE.

Some of this clearly looks like a bug but I couldn't find a corresponding bug entry yet in My Oracle Support.

Note that this post does not cover the actual usage of Pending Statistics by the optimizer - I haven't done any extensive testing in this regard, but some quick checks showed that it seems to work as expected, which means that the optimizer picks statistics for those tables that have Pending Statistics defined when setting OPTIMIZER_USE_PENDING_STATISTICS = TRUE, but still uses the statistics from the dictionary for those that don't have any pending statistics defined.

1 comment:

  1. the usual poor quality from Oracle Corporation's development: incomplete, inconsistent, poorly documented

    ReplyDelete