Wednesday, August 3, 2016

DBMS_STATS - Gather statistics on tables with many columns - 12c update

This is just a short 12c update on my post about gathering statistics on tables with many columns from some time ago.

I'm currently investigating the "Incremental Statistics" feature in 12.1.0.2 for a client, which probably will be worth one or more other posts, but since we're still in the process of evaluating and installing various patches it's too early to write about that.

As part of the investigation I've noticed a significant change in behaviour in 12.1.0.2 compared to previous versions when it comes to gathering statistics on tables with many columns, hence this post here.

The key message of the original post was that DBMS_STATS needs potentially several passes when gathering statistics on tables with many columns, which meant a significant increase in overall work and resource consumption, exaggerated by the fact that tables with that many columns consist of multiple row pieces.

Along with other significant improvements in 12c (among others new types of histograms and the capability to derive some types of these histograms from a single pass with no need to run separate passes for each histogram) obviously now no longer multiple passes are required for such tables - Oracle can obviously now cope with up to and including 1000 columns in a single pass.

Repeating the test case from the original post for a table with 1.000 columns / 10.000 blocks and using a default METHOD_OPT setting of "FOR ALL COLUMNS SIZE AUTO" results in the following single query executed by the DBMS_STATS call:
SQL ID: 98828wcrfyn0c Plan Hash: 1962185829

select /*+  full(t)    no_parallel(t) no_parallel_index(t) dbms_stats 
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
  xmlindex_sel_idx_tbl no_substrb_pad  */to_char(count("COL968")),
  substrb(dump(min("COL968"),16,0,64),1,240),substrb(dump(max("COL968"),16,0,
  64),1,240),to_char(count("COL969")),substrb(dump(min("COL969"),16,0,64),1,
  240),substrb(dump(max("COL969"),16,0,64),1,240),to_char(count("COL970")),
  substrb(dump(min("COL970"),16,0,64),1,240),substrb(dump(max("COL970"),16,0,
  64),1,240),to_char(count("COL971")),substrb(dump(min("COL971"),16,0,64),1,
  240),substrb(dump(max("COL971"),16,0,64),1,240),to_char(count("COL972")),
.
.
.
from
 "CBO_TEST"."MANY_X_COLS" t  /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,
  NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,
  NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,
  NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,
  NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,
.
.
.
  NDV,NIL,NIL,NDV,NIL,NIL*/


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      1.95       1.97          0      40011          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      1.95       1.98          0      40011          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 111     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=40011 pr=0 pw=0 time=1969578 us)
      9995       9995       9995   OPTIMIZER STATISTICS GATHERING  (cr=40011 pr=0 pw=0 time=751494 us cost=2717 size=240000 card=10000)
     10000      10000      10000    TABLE ACCESS FULL MANY_X_COLS (cr=40011 pr=0 pw=0 time=413062 us cost=2717 size=240000 card=10000)
As it can be seen the "APPROXIMATE NDV AGGREGATE" operation introduced in 11g for the new statistics gathering mode has been renamed in 12c to "OPTIMIZER STATISTICS GATHERING".

Apart from that this is good news as it minimizes the work required to gather statistics for such tables - it takes the minimum of logical / physical I/Os to process the data. And as already mentioned 12c is also capable of more, like generating synopses for incremental statistics and histograms from such a single pass.

No comments: