Tuesday, February 25, 2020

New Parallel Distribution Method For Direct Path Loads

Starting with version 12c Oracle obviously has introduced another parallel distribution method for direct path loads (applicable to INSERT APPEND and CTAS operations) when dealing with partitioned objects.

As you might already know, starting with version 11.2 Oracle supported a new variation of the PQ_DISTRIBUTE hint allowing more control how data gets distributed for the actual DML load step. In addition to the already documented methods (NONE, RANDOM / RANDOM_LOCAL, PARTITION) there is a new one EQUIPART which obviously only applies to scenarios where both, source and target table are equi partitioned.

In principle it looks like a "full-partition wise load", where the PX partition granule gets used as chunking method and each PX slave reads from the partition to process from source and writes into the corresponding partition of target. Therefore it doesn't require a redistribution of data and uses only a single PX slave set. Depending on the skew (partitions of different data volume) this might not be the best choice, but for massive data loads with evenly sized partitions it might give some advantage over the other distribution methods - the NONE distribution method being the closest, because it doesn't require additional redistribution either - but here all PX slaves read and write from any partition, so potentially there could be more contention.

Of course this new distributed method works only for the special case of equi partitioned source and target tables - and according to my tests only for the simple case of loading from the source table with no further operations like joins etc. involved.

A simple demonstration:

drop table t_part1 purge;
drop table t_part2 purge;

create table t_part1 (id, filler) partition by range (id) (
  partition n10000 values less than (10001),
  partition n20000 values less than (20001),
  partition n30000 values less than (30001),
  partition n40000 values less than (40001),
  partition n50000 values less than (50001),
  partition n60000 values less than (60001),
  partition n70000 values less than (70001),
  partition n80000 values less than (80001),
  partition n90000 values less than (90001),
  partition n100000 values less than (100001)
 )
as
select rownum as id, rpad('x', 200) as filler
from dual
connect by level <= 100000
;

create table t_part2 (id, filler) partition by range (id) (
  partition n10000 values less than (10001),
  partition n20000 values less than (20001),
  partition n30000 values less than (30001),
  partition n40000 values less than (40001),
  partition n50000 values less than (50001),
  partition n60000 values less than (60001),
  partition n70000 values less than (70001),
  partition n80000 values less than (80001),
  partition n90000 values less than (90001),
  partition n100000 values less than (100001)
 )
as
select rownum as id, rpad('x', 200) as filler
from dual
where 1 = 2
;

alter session enable parallel dml;

-- alter session set tracefile_identifier = equipart;

-- alter session set events 'trace [RDBMS.SQL_Optimizer.*] disk=highest';

--explain plan for
insert /*+ append parallel(2) pq_distribute(t_part2 equipart) */ into t_part2 select * from t_part1;

From 12.1.0.2 on the execution plan for the INSERT APPEND operation looks like this:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                   |          |   100K|    19M|   446   (1)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR                    |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10000 |   100K|    19M|   446   (1)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (EQUI-PARTITION) | T_PART2  |       |       |            |          |       |       |  Q1,00 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |          |   100K|    19M|   446   (1)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|   5 |      PX PARTITION RANGE ALL        |          |   100K|    19M|   446   (1)| 00:00:01 |     1 |    10 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL            | T_PART1  |   100K|    19M|   446   (1)| 00:00:01 |     1 |    10 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$1
   6 - SEL$1 / T_PART1@SEL$1
 
Outline Data
-------------
 
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$1" "T_PART1"@"SEL$1")
      FULL(@"INS$1" "T_PART2"@"INS$1")
      PQ_DISTRIBUTE(@"INS$1" "T_PART2"@"INS$1" EQUIPART)
      OUTLINE_LEAF(@"INS$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('18.1.0')
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Oracle doesn't always automatically choose this distribution method. If you want to enforce it (and it is legal) you can use the PQ_DISTRIBUTE(EQUIPART) hint as outlined.

Monday, January 13, 2020

12.2.0.1 And Later Support (Limited) Extended Stats On Virtual columns / Column Groups of Expressions

I do have a demo as part of my optimizer related workshops that shows the restriction / limitation of DBMS_STATS not supporting extended statistics on virtual columns / group of expressions, so for example the combination of both expressions and column groups, like ((TRUNC(COL1)), (TRUNC(COL2))).

Surprisingly, when following a certain sequence of operation, this starts working (to some degree) from 12.2.0.1 on.

The official documentation up to and including 19c still mentions this as a restriction, and since it doesn't work when explicitly referencing virtual columns (see the test case what I exactly mean by this) I assume this is more like a side effect / unintended feature.

Nevertheless, the optimizer happily picks up this additional information and comes up with improved estimates when having a combination of skew and correlation on expressions, for example.

The following test case shows the change in behaviour from 12.2.0.1 on:

set echo on linesize 200 trimspool on trimout on tab off pagesize 999 timing on

alter session set nls_language = american;

drop table t1;

purge table t1;

-- Initialize the random generator for "reproducible" pseudo-randomness
exec dbms_random.seed(0)

-- ATTR1 and ATTR2 are both skewed and correlated
create table t1
as
select
        rownum as id
      , trunc(dbms_random.value(1, 1000000000000)) as fk
      , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
      , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
      , rpad('x', 100) as filler
from
        dual
connect by
        level <= 1000000
;

-- Histograms on ATTR1 and ATTR2 for representing skew properly
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')

-- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')

-- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')

-- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
-- Officially this is isn't allowed and errors out up to and including 12.1.0.2
-- ORA-20001: Error when processing extension -  missing right parenthesis
-- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
-- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
-- Remove the previous step and it will error out:
-- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')

-- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
-- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
-- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
explain plan for
select
        count(*)
from
        t1 a
where
        trunc(attr1) = 1
and     trunc(attr2) = 1;

select * from table(dbms_xplan.display(format => 'TYPICAL'));

-- But: Explicitly referencing a virtual column doesn't work
-- This will error out:
-- ORA-20001: Error when processing extension -  virtual column is referenced in a column expression

exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')

exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')

-- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')


alter table t1 add (trunc_attr1 as (trunc(attr1)));

alter table t1 add (trunc_attr2 as (trunc(attr2)));

-- This works and is supported in all versions supporting virtual columns
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')

-- This errors out: ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
-- even in 12.2.0.1 and later
-- But: Works in 12.2.0.1 and later if the call to
--   dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
-- above is removed, because the extension then already exists (!)
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')

-- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')

And here is the output I get when using 12.1.0.2:

SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 10 12:39:00 2020

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options


USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED  SPID            OPID  CPID            SADDR            PADDR
-------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ----------------
CBO_TEST             orcl121      DELLXPS13                 368   46472    12.1.0.2.0 20200110 6908            59    15536:4996      00007FFA110E9B88 00007FFA12B6F1E8



SQL>
SQL> alter session set nls_language = american;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> drop table t1;

Table dropped.

Elapsed: 00:00:00.02
SQL>
SQL> purge table t1;

Table purged.

Elapsed: 00:00:00.06
SQL>
SQL> -- Initialize the random generator for "reproducible" pseudo-randomness
SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> -- ATTR1 and ATTR2 are both skewed and correlated
SQL> create table t1
  2  as
  3  select
  4          rownum as id
  5        , trunc(dbms_random.value(1, 1000000000000)) as fk
  6        , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
  7        , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
  8        , rpad('x', 100) as filler
  9  from
 10          dual
 11  connect by
 12          level <= 1000000
 13  ;

Table created.

Elapsed: 00:00:13.07
SQL>
SQL> -- Histograms on ATTR1 and ATTR2 for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.43
SQL>
SQL> -- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.08
SQL>
SQL> -- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.51
SQL>
SQL> -- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
SQL> -- Officially this is isn't allowed and errors out up to and including 12.1.0.2
SQL> -- ORA-20001: Error when processing extension -  missing right parenthesis
SQL> -- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
SQL> -- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
SQL> -- Remove the previous step and it will error out:
SQL> -- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  missing right parenthesis


Elapsed: 00:00:00.04
SQL>
SQL> -- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
SQL> -- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
SQL> -- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
SQL> explain plan for
  2  select
  3          count(*)
  4  from
  5          t1 a
  6  where
  7          trunc(attr1) = 1
  8  and     trunc(attr2) = 1;

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display(format => 'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |  4777   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   807K|  6305K|  4777   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC("ATTR1")=1 AND TRUNC("ATTR2")=1)

14 rows selected.

Elapsed: 00:00:00.11
SQL>
SQL> -- But: Explicitly referencing a virtual column doesn't work
SQL> -- This will error out:
SQL> -- ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.13
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> -- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
BEGIN dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  missing right parenthesis


Elapsed: 00:00:00.02
SQL>
SQL>
SQL> alter table t1 add (trunc_attr1 as (trunc(attr1)));

Table altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter table t1 add (trunc_attr2 as (trunc(attr2)));

Table altered.

Elapsed: 00:00:00.01
SQL>
SQL> -- This works and is supported in all versions supporting virtual columns
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.57
SQL>
SQL> -- This errors out: ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
SQL> -- even in 12.2.0.1 and later
SQL> -- But: Works in 12.2.0.1 and later if the call to
SQL> --   dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
SQL> -- above is removed, because the extension then already exists (!)
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
ORA-06512: at "SYS.DBMS_STATS", line 34634
ORA-06512: at line 1


Elapsed: 00:00:00.06
SQL>
SQL> -- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  missing right parenthesis


Elapsed: 00:00:00.04
SQL>

And that is what I get from 12.2.0.1 on, here using 19.3:

SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 10 12:35:18 2020

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Verbunden mit:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production


USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED  SPID            OPID  CPID            SADDR            PADDR
-------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ----------------
CBO_TEST             orcl19c      DELLXPS13                 146   48961    19.0.0.0.0 20200110 5648            53    7260:13644      00007FF91687B3D8 00007FF91656B858



SQL>
SQL> alter session set nls_language = american;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> drop table t1;

Table dropped.

Elapsed: 00:00:00.02
SQL>
SQL> purge table t1;

Table purged.

Elapsed: 00:00:00.10
SQL>
SQL> -- Initialize the random generator for "reproducible" pseudo-randomness
SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> -- ATTR1 and ATTR2 are both skewed and correlated
SQL> create table t1
  2  as
  3  select
  4          rownum as id
  5        , trunc(dbms_random.value(1, 1000000000000)) as fk
  6        , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
  7        , case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
  8        , rpad('x', 100) as filler
  9  from
 10          dual
 11  connect by
 12          level <= 1000000
 13  ;

Table created.

Elapsed: 00:00:11.89
SQL>
SQL> -- Histograms on ATTR1 and ATTR2 for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.68
SQL>
SQL> -- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.75
SQL>
SQL> -- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.43
SQL>
SQL> -- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
SQL> -- Officially this is isn't allowed and errors out up to and including 12.1.0.2
SQL> -- ORA-20001: Error when processing extension -  missing right parenthesis
SQL> -- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
SQL> -- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
SQL> -- Remove the previous step and it will error out:
SQL> -- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.24
SQL>
SQL> -- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
SQL> -- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
SQL> -- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
SQL> explain plan for
  2  select
  3          count(*)
  4  from
  5          t1 a
  6  where
  7          trunc(attr1) = 1
  8  and     trunc(attr2) = 1;

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display(format => 'TYPICAL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |  4797   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   893K|  6977K|  4797   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC("ATTR1")=1 AND TRUNC("ATTR2")=1)

14 rows selected.

Elapsed: 00:00:00.10
SQL>
SQL> -- But: Explicitly referencing a virtual column doesn't work
SQL> -- This will error out:
SQL> -- ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.33
SQL>
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.33
SQL>
SQL> -- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
SQL> exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.11
SQL>
SQL>
SQL> alter table t1 add (trunc_attr1 as (trunc(attr1)));

Table altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter table t1 add (trunc_attr2 as (trunc(attr2)));

Table altered.

Elapsed: 00:00:00.01
SQL>
SQL> -- This works and is supported in all versions supporting virtual columns
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.49
SQL>
SQL> -- This errors out: ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
SQL> -- even in 12.2.0.1 and later
SQL> -- But: Works in 12.2.0.1 and later if the call to
SQL> --   dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
SQL> -- above is removed, because the extension then already exists (!)
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')
BEGIN dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)'); END;

*
ERROR at line 1:
ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
ORA-06512: at "SYS.DBMS_STATS", line 40751
ORA-06512: at "SYS.DBMS_STATS", line 40035
ORA-06512: at "SYS.DBMS_STATS", line 38912
ORA-06512: at "SYS.DBMS_STATS", line 15726
ORA-06512: at "SYS.DBMS_STATS", line 22064
ORA-06512: at "SYS.DBMS_STATS", line 22162
ORA-06512: at "SYS.DBMS_STATS", line 22232
ORA-06512: at "SYS.DBMS_STATS", line 22864
ORA-06512: at "SYS.DBMS_STATS", line 38313
ORA-06512: at "SYS.DBMS_STATS", line 39738
ORA-06512: at "SYS.DBMS_STATS", line 40183
ORA-06512: at "SYS.DBMS_STATS", line 40732
ORA-06512: at line 1


Elapsed: 00:00:00.09
SQL>
SQL> -- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.32
SQL>

So, from 12.2.0.1 it looks like extended statistics on virtual columns are supported to some degree, when following a certain sequence of operation - first creating extended statistics on each of the expressions used, which creates corresponding virtual columns under the cover, and afterwards creating extended statistics using those expressions as part of the column group expression.

Strange enough, when explicitly creating virtual columns for those expressions creating a column group explicitly referencing those virtual columns doesn't work - but using the expressions covered by the virtual columns instead works even with those virtual columns created explicitly.

Thursday, January 9, 2020

Single Value Column Frequency Histogram Oracle 12c and later

It is hopefully in the meantime well known that Oracle has introduced in version 11g a new algorithm to gather statistics on a table that no longer requires sorting for determining the critical Number Of Distinct Values (NDV) figure - it instead uses a clever "approximate NDV" algorithm which always reads 100% of the table data and therefore in principle generates very accurate statistics. This new algorithm gets used only when the ESTIMATE_PERCENT parameter to the DBMS_STATS.GATHER*STATS calls is left at default or explicitly passed as "DBMS_STATS.AUTO_SAMPLE_SIZE". This new algorithm is also required in case other new features like "Incremental Statistics" should be used.

In 12c Oracle improved this algorithm allowing the generation of Frequency and the new Top Frequency histogram types in a single pass. The new Hybrid histogram type still requires a separate pass.

11g always required a separate pass per histogram to be created - no matter what type of histogram (in 11g there was only Frequency and Height-Balanced) - which resulted in a quite aggressive sampling used for that purpose to minimize the time and resource usage for those separate passes, typically just using 5,500 rows and only sized up in case there were many NULL values (which is a very small sample size for larger tables). Note that this aggressive sampling only applies to the new "approximate NDV" code path - if you specify any explicit ESTIMATE_PERCENT Oracle uses the old code (which requires sorting for determining the NDV figure) and therefore the separate passes required to generate histograms are based on the same sampling percentage as used for the basic table and column statistics - actually Oracle can create a Global Temporary Table in a separate pass covering the required data from several columns in this case to avoid repeatedly scanning the table again and again.

I've recently came across an edge case at a client that showed that the new code has a flaw in the special case of columns that only have a single value on table or (sub)partition level.

First of all in my opinion in this special case of a column having only a single value a (Frequency) histogram doesn't add any value - everything required can be determined from the basic column statistics anyway - low and high value are the same, NDV is 1 and the number of NULLs should be all that is needed for proper cardinality estimates on such columns.

Now the new code path seems to be quite happy to generate histograms on all kinds of columns with low number of distinct values, be it useful or not. Since starting with 12c these histograms should all be covered by the main pass - since they can be represented by Frequency histograms when using the default bucket size of 254 - arguably there isn't much overhead in creating them, so why bother.

However, there is a flaw in the code: When the column has just a single value, then the code for some (unknown) reason determines that it requires a separate pass to generate a histogram and doesn't make use of the information already gathered as part of the main pass - which should hold everything needed.

So Oracle runs a separate pass to gather information for this histogram. Usually this doesn't make much difference, but this separate pass is no longer using the 100% data but resorts to the aggressive sampling as outlined above - if applicable. So usually it might just take those 5,500 rows to create a Frequency histogram on this single value column.

But in the edge case of such a single valued column that is NULL for the majority rows, the code recognizes this and no longer uses the aggressive sampling. Instead - probably depending on the number of NULLs - it needs to read a larger proportion of the table to find some non-NULL data.

In the case of my client this was a very large table, had numerous of such special edge case columns (single valued, NULL for most of the rows) which resulted in dozens of non-sampled full table scans of this very large table taking several days to complete.

When enabling the DBMS_STATS specific tracing the behaviour can be reproduced on the latest available versions (19.3 in my case here) - I've used the following test case to test four different scenarios and how the code behaved:

set echo on serveroutput on size unlimited lines 800 long 800

select * from v$version;

exec dbms_stats.set_global_prefs('TRACE', 1048575)

-- Testcase 1: Single distinct value, almost all values are NULL except three rows
-- This scenario triggers a separate query to generate a histogram using no (!!) sampling
drop table test_stats purge;

create table test_stats
as
select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
rownum as id,
case when rownum <= 3 then 1 else null end as almost_null
from
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
;

exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')

-- Testcase 2: Single distinct value, but only three rows are NULL
-- This scenario triggers a separate query to generate a histogram but using the usual aggressive sampling (5500 rows)
drop table test_stats purge;

create table test_stats
as
select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
rownum as id,
case when rownum > 3 then 1 else null end as almost_null
from
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
;

exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')

-- Testcase 3: 11 distinct values, but only three rows are NULL
-- This scenario extracts the values directly from the ROWIDs returned from the main query, no separate query
drop table test_stats purge;

create table test_stats
as
select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
rownum as id,
case when rownum > 3 then trunc(rownum / 100000) else null end as almost_null
from
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
;

exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')

-- Testcase 4: Three distinct values in three rows and all other rows have NULL
-- This scenario extracts the values directly from the ROWIDs returned from the main query, no separate query
-- Applies to two distinct values, too
-- So a single distinct value looks like a special case that triggers a separate query
-- If this is with combination of almost all rows having NULLs this query doesn't use sampling
-- ! Big threat if the table is big !
drop table test_stats purge;

create table test_stats
as
select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
rownum as id,
case when rownum <= 2 then rownum else null end as almost_null
from
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
(select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
;

exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')

And the output from 19.3 looks like this - the critical parts highlighted in red and bold:

SQL*Plus: Release 11.2.0.1.0 Production on Mon Aug 19 17:07:42 2019

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Verbunden mit:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production


USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED  SPID            OPID  CPID            SADDR            PADDR
-------------------- ------------ ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ----------------
CBO_TEST             orcl19c      DELLXPS13                 394   51553    19.0.0.0.0 20190819 4192            59    6232:10348      00007FFDE6AE6478 00007FFDE6573A48



SQL>
SQL> select * from v$version;

BANNER                                                                           BANNER_FULL                                                                                                                                                      BANNER_LEGACY                                                                        CON_ID
-------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ----------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production           Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production                                                                                           Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production                    0
                                                                                 Version 19.3.0.0.0


SQL>
SQL> exec dbms_stats.set_global_prefs('TRACE', 1048575)

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>
SQL> -- Testcase 1: Single distinct value, almost all values are NULL except three rows
SQL> -- This scenario triggers a separate query to generate a histogram using no (!!) sampling
SQL> drop table test_stats purge;
drop table test_stats purge
           *
FEHLER in Zeile 1:
ORA-00942: Tabelle oder View nicht vorhanden


SQL>
SQL> create table test_stats
  2  as
  3  select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
  4  rownum as id,
  5  case when rownum <= 3 then 1 else null end as almost_null
  6  from
  7  (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
  8  (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
  9  ;

Tabelle wurde erstellt.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')
DBMS_STATS: Record gather table stats operation on table : test_stats
DBMS_STATS:   job name:
DBMS_STATS:    |--> Operation id: 223
DBMS_STATS: gather_table_stats: 
DBMS_STATS: Start gather table stats -- tabname: test_stats
DBMS_STATS:   job name:
DBMS_STATS: Started table CBO_TEST.TEST_STATS. at 19.08.19 17:07:43,293000000 +02:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting Mode: FALSE
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix:  (parent pfix: ) ownname:  tabname: test_stats partname:  estimate_percent: 101 block_sample: FALSE method_opt: for columns size 1 id for columns size 254 almost_null degree: 32766 granularity: Z gIdxGran:  cascade: NULL stattab:  statid:  statown:  no_invalidate: NULL flush_colu: TRUE fxt:  stattype: DATA start_time: 08-19-2019 17:07:43 gathering_group_stats: FALSE cms_only: FALSE force: FALSE options: Z executionPhase: SERIAL (NON-CONCURRENT) incremental: FALSE concurrent: FALSE loc_degree:  loc_method_opt: for columns size 1 id for columns size 254 almost_null fxt_typ: 0 tobjn: 72916
DBMS_STATS: Preferences for table CBO_TEST.TEST_STATS
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME                                         -
DBMS_STATS: STATS_RETENTION                                   -
DBMS_STATS: MON_MODS_ALL_UPD_TIME                             -
DBMS_STATS: SNAPSHOT_UPD_TIME                                 -
DBMS_STATS: TRACE                                             - 0
DBMS_STATS: DEBUG                                             - 0
DBMS_STATS: SYS_FLAGS                                         - 0
DBMS_STATS: SPD_RETENTION_WEEKS                               - 53
DBMS_STATS: CASCADE                                           - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT                                  - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE                                            - NULL
DBMS_STATS: METHOD_OPT                                        - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE                                     - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY                                       - AUTO
DBMS_STATS: PUBLISH                                           - TRUE
DBMS_STATS: STALE_PERCENT                                     - 10
DBMS_STATS: APPROXIMATE_NDV                                   - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM                         - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE                        - FALSE
DBMS_STATS: INCREMENTAL                                       - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL                      - TRUE
DBMS_STATS: AUTOSTATS_TARGET                                  - AUTO
DBMS_STATS: CONCURRENT                                        - OFF
DBMS_STATS: JOB_OVERHEAD_PERC                                 - 1
DBMS_STATS: JOB_OVERHEAD                                      - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS                           - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS                        - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS                          - 3
DBMS_STATS: TABLE_CACHED_BLOCKS                               - 1
DBMS_STATS: INCREMENTAL_LEVEL                                 - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS                             - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS                                           - GATHER
DBMS_STATS: GATHER_AUTO                                       - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY                                     - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE                                         - 0
DBMS_STATS: GATHER_SCAN_RATE                                  - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER                    - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS                              - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS                         - 15
DBMS_STATS: ROOT_TRIGGER_PDB                                  - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD                         - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS                        - FALSE
DBMS_STATS: AUTO_TASK_STATUS                                  - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME                            - 3600
DBMS_STATS: AUTO_TASK_INTERVAL                                - 900
DBMS_STATS: reporting_man_log_task: target: "CBO_TEST"."TEST_STATS" objn: 72916 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: GLOBAL AND PARTITION global_requested: NULL pfix:
DBMS_STATS: (Baseline)
DBMS_STATS: STATISTIC                      LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized                    0            0
DBMS_STATS: DML Parallelized                        0            0
DBMS_STATS: DDL Parallelized                        0            0
DBMS_STATS: DFO Trees                               0            0
DBMS_STATS: Server Threads                          0            0
DBMS_STATS: Allocation Height                       0            0
DBMS_STATS: Allocation Width                        0            0
DBMS_STATS: Local Msgs Sent                         0            0
DBMS_STATS: Distr Msgs Sent                         0            0
DBMS_STATS: Local Msgs Recv'd                       0            0
DBMS_STATS: Distr Msgs Recv'd                       0            0
DBMS_STATS: DOP                                     0            0
DBMS_STATS: Slave Sets                              0            0
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix:  ownname: CBO_TEST tabname: TEST_STATS pname:  spname:  execution phase: 1
Specified DOP=1 blocks=1520 DOP used=1
Specified DOP=1 blocks=1520 DOP used=1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y    Y    Y                                       Y    ID
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y                        Y    ALMOST_NULL
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Approximate NDV Options
DBMS_STATS: NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U
DBMS_STATS: Starting query at 19.08.19 17:07:43,386000000 +02:00
DBMS_STATS: select /*+  full(t)    no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */to_char(count("ID")),substrb(dump(min("ID"),16,0,64),1,240),substrb(dump(max("ID"),16,0,64),1,240),to_char(count("ALMOST_NULL")),substrb(dump(min("ALMOST_NULL"),16,0,64),1,240),substrb(dump(max("ALMOST_NULL"),16,0,64),1,240),count(rowidtochar(rowid)) from "CBO_TEST"."TEST_STATS" t  /* NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U*/
DBMS_STATS: Ending query at 19.08.19 17:07:43,589000000 +02:00
DBMS_STATS: STATISTIC                      LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized                    0            0
DBMS_STATS: DML Parallelized                        0            0
DBMS_STATS: DDL Parallelized                        0            0
DBMS_STATS: DFO Trees                               0            0
DBMS_STATS: Server Threads                          0            0
DBMS_STATS: Allocation Height                       0            0
DBMS_STATS: Allocation Width                        0            0
DBMS_STATS: Local Msgs Sent                         0            0
DBMS_STATS: Distr Msgs Sent                         0            0
DBMS_STATS: Local Msgs Recv'd                       0            0
DBMS_STATS: Distr Msgs Recv'd                       0            0
DBMS_STATS: DOP                                     0            0
DBMS_STATS: Slave Sets                              0            0
DBMS_STATS: Approximate NDV Result
DBMS_STATS: 
  
    0
    1000000
    1000000
    6
    1001984
    1000000
    3979802
  
  
    1
    Typ=2 Len=2: c1,2
  
  
    2
    Typ=2 Len=2: c4,2
  
  
    3
    3
    0
    1
    3
    6
    1
    AAARzUAAPAAAACBAAA,3,
  
  
    4
    Typ=2 Len=2: c1,2
  


    5
    Typ=2 Len=2: c1,2
  


DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min:
DBMS_STATS:          max:
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    NULL      NULL      NULL      NULL      NULL      NULL      NULL      7         0         0
DBMS_STATS:          min:
DBMS_STATS:          max: Typ=2 Len=2: c1,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    NULL      NULL      NULL      NULL      NULL      NULL      NULL      7         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c1,2
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max:
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c4,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    6         3         NULL      NULL      NULL      1         NULL      7         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c1,2
DBMS_STATS: start processing top n values for column "ALMOST_NULL"
DBMS_STATS:   >> frequency histograms is not feasible
                       (dbms_lob.getlength(topn_values) <=
                        length()), skip!
DBMS_STATS:  no histogram: setting density to 1/ndv (,000001)
DBMS_STATS:  no histogram: setting density to 1/ndv (1)
DBMS_STATS: Iteration: 1 numhist: 1
DBMS_STATS: Iteration 1, percentage 100 nblks: 1520
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:                                                          Y    ID
DBMS_STATS:                       Y    Y    Y                        Y    ALMOST_NULL
Specified DOP=1 blocks=1520 DOP used=1
DBMS_STATS: Building Histogram for ALMOST_NULL
DBMS_STATS:  bktnum=254, nnv=3, snnv=3, sndv=1, est_ndv=1, mnb=254
DBMS_STATS:  Trying frequency histogram
DBMS_STATS: Starting query at 19.08.19 17:07:43,652000000 +02:00
DBMS_STATS: select substrb(dump(val,16,0,64),1,240) ep, cnt from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */"ALMOST_NULL" val,count(*) cnt  from "CBO_TEST"."TEST_STATS" t  where "ALMOST_NULL" is not null  group by "ALMOST_NULL") order by val
DBMS_STATS: Evaluating frequency histogram for col: "ALMOST_NULL"
DBMS_STATS:  number of values = 1, max # of buckects = 254, pct = 100, ssize = 3
DBMS_STATS:   csr.hreq: 1  Histogram gathering flags: 1031
DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE
DBMS_STATS: ====================================================================================================
DBMS_STATS: Statistics from clist for CBO_TEST.TEST_STATS:
DBMS_STATS: ====================================================================================================
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             4,979802  1000000   0         1000000   1000000   1000000   ,000001   0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c4,2
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    1,000006  3         999997    3         1         1         ,16666666 1030      1         1
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c1,2
DBMS_STATS:          Histograms:
DBMS_STATS:          ---------------------------------------------------------------------------------------------------
DBMS_STATS:          BVAL   RPCNT    EAVAL                                                            ENVAL    EDVAL
DBMS_STATS:          ---------------------------------------------------------------------------------------------------
DBMS_STATS:          3               C102                                                             1        Typ=2 Len=2: c1,2
DBMS_STATS:          Need Actual Values (DSC_EAVS)
DBMS_STATS:          Histogram Type: FREQUENCY Data Type: 2
DBMS_STATS:          Histogram Flags: 4100 Histogram Gathering Flags: 1030
DBMS_STATS:          Incremental: FALSE Fix Control 13583722: 1
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Number of rows in the table = 1000000, blocks = , average row length = 6, chain count = , scan rate = 0, sample size = 1000000, cstats.count = 3, cind = 3
DBMS_STATS: prepare reporting structures...
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ID part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ALMOST_NULL part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: End of construct analyze using sql.
DBMS_STATS: Finished table CBO_TEST.TEST_STATS. at 19.08.19 17:07:43,761000000 +02:00
DBMS_STATS: reporting_man_update_task: objn: 72916 auto_stats: FALSE status: COMPLETED ctx.batching_coeff: 0

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>
SQL> -- Testcase 2: Single distinct value, but only three rows are NULL
SQL> -- This scenario triggers a separate query to generate a histogram but using the usual aggressive sampling (5500 rows)
SQL> drop table test_stats purge;

Tabelle wurde geloscht.

SQL>
SQL> create table test_stats
  2  as
  3  select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
  4  rownum as id,
  5  case when rownum > 3 then 1 else null end as almost_null
  6  from
  7  (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
  8  (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
  9  ;

Tabelle wurde erstellt.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')
DBMS_STATS: Record gather table stats operation on table : test_stats
DBMS_STATS:   job name:
DBMS_STATS:    |--> Operation id: 224
DBMS_STATS: gather_table_stats: 
DBMS_STATS: Start gather table stats -- tabname: test_stats
DBMS_STATS:   job name:
DBMS_STATS: Started table CBO_TEST.TEST_STATS. at 19.08.19 17:07:44,511000000 +02:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting Mode: FALSE
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix:  (parent pfix: ) ownname:  tabname: test_stats partname:  estimate_percent: 101 block_sample: FALSE method_opt: for columns size 1 id for columns size 254 almost_null degree: 32766 granularity: Z gIdxGran:  cascade: NULL stattab:  statid:  statown:  no_invalidate: NULL flush_colu: TRUE fxt:  stattype: DATA start_time: 08-19-2019 17:07:44 gathering_group_stats: FALSE cms_only: FALSE force: FALSE options: Z executionPhase: SERIAL (NON-CONCURRENT) incremental: FALSE concurrent: FALSE loc_degree:  loc_method_opt: for columns size 1 id for columns size 254 almost_null fxt_typ: 0 tobjn: 72920
DBMS_STATS: Preferences for table CBO_TEST.TEST_STATS
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME                                         -
DBMS_STATS: STATS_RETENTION                                   -
DBMS_STATS: MON_MODS_ALL_UPD_TIME                             -
DBMS_STATS: SNAPSHOT_UPD_TIME                                 -
DBMS_STATS: TRACE                                             - 0
DBMS_STATS: DEBUG                                             - 0
DBMS_STATS: SYS_FLAGS                                         - 0
DBMS_STATS: SPD_RETENTION_WEEKS                               - 53
DBMS_STATS: CASCADE                                           - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT                                  - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE                                            - NULL
DBMS_STATS: METHOD_OPT                                        - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE                                     - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY                                       - AUTO
DBMS_STATS: PUBLISH                                           - TRUE
DBMS_STATS: STALE_PERCENT                                     - 10
DBMS_STATS: APPROXIMATE_NDV                                   - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM                         - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE                        - FALSE
DBMS_STATS: INCREMENTAL                                       - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL                      - TRUE
DBMS_STATS: AUTOSTATS_TARGET                                  - AUTO
DBMS_STATS: CONCURRENT                                        - OFF
DBMS_STATS: JOB_OVERHEAD_PERC                                 - 1
DBMS_STATS: JOB_OVERHEAD                                      - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS                           - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS                        - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS                          - 3
DBMS_STATS: TABLE_CACHED_BLOCKS                               - 1
DBMS_STATS: INCREMENTAL_LEVEL                                 - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS                             - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS                                           - GATHER
DBMS_STATS: GATHER_AUTO                                       - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY                                     - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE                                         - 0
DBMS_STATS: GATHER_SCAN_RATE                                  - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER                    - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS                              - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS                         - 15
DBMS_STATS: ROOT_TRIGGER_PDB                                  - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD                         - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS                        - FALSE
DBMS_STATS: AUTO_TASK_STATUS                                  - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME                            - 3600
DBMS_STATS: AUTO_TASK_INTERVAL                                - 900
DBMS_STATS: reporting_man_log_task: target: "CBO_TEST"."TEST_STATS" objn: 72920 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: GLOBAL AND PARTITION global_requested: NULL pfix:
DBMS_STATS: (Baseline)
DBMS_STATS: STATISTIC                      LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized                    0            0
DBMS_STATS: DML Parallelized                        0            0
DBMS_STATS: DDL Parallelized                        0            0
DBMS_STATS: DFO Trees                               0            0
DBMS_STATS: Server Threads                          0            0
DBMS_STATS: Allocation Height                       0            0
DBMS_STATS: Allocation Width                        0            0
DBMS_STATS: Local Msgs Sent                         0            0
DBMS_STATS: Distr Msgs Sent                         0            0
DBMS_STATS: Local Msgs Recv'd                       0            0
DBMS_STATS: Distr Msgs Recv'd                       0            0
DBMS_STATS: DOP                                     0            0
DBMS_STATS: Slave Sets                              0            0
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix:  ownname: CBO_TEST tabname: TEST_STATS pname:  spname:  execution phase: 1
Specified DOP=1 blocks=1794 DOP used=1
Specified DOP=1 blocks=1794 DOP used=1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y    Y    Y                                       Y    ID
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y                        Y    ALMOST_NULL
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Approximate NDV Options
DBMS_STATS: NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U
DBMS_STATS: Starting query at 19.08.19 17:07:44,527000000 +02:00
DBMS_STATS: select /*+  full(t)    no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */to_char(count("ID")),substrb(dump(min("ID"),16,0,64),1,240),substrb(dump(max("ID"),16,0,64),1,240),to_char(count("ALMOST_NULL")),substrb(dump(min("ALMOST_NULL"),16,0,64),1,240),substrb(dump(max("ALMOST_NULL"),16,0,64),1,240),count(rowidtochar(rowid)) from "CBO_TEST"."TEST_STATS" t  /* NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U*/
DBMS_STATS: Ending query at 19.08.19 17:07:44,730000000 +02:00
DBMS_STATS: STATISTIC                      LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized                    0            0
DBMS_STATS: DML Parallelized                        0            0
DBMS_STATS: DDL Parallelized                        0            0
DBMS_STATS: DFO Trees                               0            0
DBMS_STATS: Server Threads                          0            0
DBMS_STATS: Allocation Height                       0            0
DBMS_STATS: Allocation Width                        0            0
DBMS_STATS: Local Msgs Sent                         0            0
DBMS_STATS: Distr Msgs Sent                         0            0
DBMS_STATS: Local Msgs Recv'd                       0            0
DBMS_STATS: Distr Msgs Recv'd                       0            0
DBMS_STATS: DOP                                     0            0
DBMS_STATS: Slave Sets                              0            0
DBMS_STATS: Approximate NDV Result
DBMS_STATS: 
  
    0
    1000000
    1000000
    6
    1001984
    1000000
    3979802
  
  
    1
    Typ=2 Len=2: c1,2
  
  
    2
    Typ=2 Len=2: c4,2
  
  
    3
    999997
    0
    1
    999997
    1999994
    1
    AAARzYAAPAAAACBAAD,999997,
  
  
    4
    Typ=2 Len=2: c1,2


  
    5
    Typ=2 Len=2: c1,2
  


DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min:
DBMS_STATS:          max:
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    NULL      NULL      NULL      NULL      NULL      NULL      NULL      7         0         0
DBMS_STATS:          min:
DBMS_STATS:          max: Typ=2 Len=2: c1,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    NULL      NULL      NULL      NULL      NULL      NULL      NULL      7         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c1,2
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max:
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c4,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    1999994   999997    NULL      NULL      NULL      1         NULL      7         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c1,2
DBMS_STATS: start processing top n values for column "ALMOST_NULL"
DBMS_STATS:   >> frequency histograms is not feasible
                       (dbms_lob.getlength(topn_values) <=
                        length()), skip!
DBMS_STATS:  no histogram: setting density to 1/ndv (,000001)
DBMS_STATS:  no histogram: setting density to 1/ndv (1)
DBMS_STATS: Iteration: 1 numhist: 1
DBMS_STATS: Iteration 1, percentage ,5500016500049500148500445501336504009512 nblks: 1794
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:                                                          Y    ID
DBMS_STATS:                       Y    Y    Y                        Y    ALMOST_NULL
Specified DOP=1 blocks=1794 DOP used=1
DBMS_STATS: Building Histogram for ALMOST_NULL
DBMS_STATS:  bktnum=254, nnv=999997, snnv=5500, sndv=1, est_ndv=1, mnb=254
DBMS_STATS:  Trying frequency histogram
DBMS_STATS: Starting query at 19.08.19 17:07:44,730000000 +02:00
DBMS_STATS: select substrb(dump(val,16,0,64),1,240) ep, cnt from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */"ALMOST_NULL" val,count(*) cnt  from "CBO_TEST"."TEST_STATS" sample (   .5500016500)  t  where "ALMOST_NULL" is not null  group by "ALMOST_NULL") order by val
DBMS_STATS: Evaluating frequency histogram for col: "ALMOST_NULL"
DBMS_STATS:  number of values = 1, max # of buckects = 254, pct = ,5500016500049500148500445501336504009512, ssize = 5499
DBMS_STATS:   csr.hreq: 1  Histogram gathering flags: 1031
DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE
DBMS_STATS: ====================================================================================================
DBMS_STATS: Statistics from clist for CBO_TEST.TEST_STATS:
DBMS_STATS: ====================================================================================================
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             4,979802  1000000   0         1000000   1000000   1000000   ,000001   0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c4,2
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    2,999994  999997    3         5499      1         1         ,00000050 1030      1         1
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c1,2
DBMS_STATS:          Histograms:
DBMS_STATS:          ---------------------------------------------------------------------------------------------------
DBMS_STATS:          BVAL   RPCNT    EAVAL                                                            ENVAL    EDVAL
DBMS_STATS:          ---------------------------------------------------------------------------------------------------
DBMS_STATS:          5499            C102                                                             1        Typ=2 Len=2: c1,2
DBMS_STATS:          Need Actual Values (DSC_EAVS)
DBMS_STATS:          Histogram Type: FREQUENCY Data Type: 2
DBMS_STATS:          Histogram Flags: 4100 Histogram Gathering Flags: 1030
DBMS_STATS:          Incremental: FALSE Fix Control 13583722: 1
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Number of rows in the table = 1000000, blocks = , average row length = 8, chain count = , scan rate = 0, sample size = 1000000, cstats.count = 3, cind = 3
DBMS_STATS: prepare reporting structures...
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ID part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ALMOST_NULL part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: End of construct analyze using sql.
DBMS_STATS: Finished table CBO_TEST.TEST_STATS. at 19.08.19 17:07:44,761000000 +02:00
DBMS_STATS: reporting_man_update_task: objn: 72920 auto_stats: FALSE status: COMPLETED ctx.batching_coeff: 0

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>
SQL> -- Testcase 3: 11 distinct values, but only three rows are NULL
SQL> -- This scenario extracts the values directly from the ROWIDs returned from the main query, no separate query
SQL> drop table test_stats purge;

Tabelle wurde geloscht.

SQL>
SQL> create table test_stats
  2  as
  3  select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
  4  rownum as id,
  5  case when rownum > 3 then trunc(rownum / 100000) else null end as almost_null
  6  from
  7  (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
  8  (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
  9  ;

Tabelle wurde erstellt.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')
DBMS_STATS: Record gather table stats operation on table : test_stats
DBMS_STATS:   job name:
DBMS_STATS:    |--> Operation id: 225
DBMS_STATS: gather_table_stats: 
DBMS_STATS: Start gather table stats -- tabname: test_stats
DBMS_STATS:   job name:
DBMS_STATS: Started table CBO_TEST.TEST_STATS. at 19.08.19 17:07:45,448000000 +02:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting Mode: FALSE
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix:  (parent pfix: ) ownname:  tabname: test_stats partname:  estimate_percent: 101 block_sample: FALSE method_opt: for columns size 1 id for columns size 254 almost_null degree: 32766 granularity: Z gIdxGran:  cascade: NULL stattab:  statid:  statown:  no_invalidate: NULL flush_colu: TRUE fxt:  stattype: DATA start_time: 08-19-2019 17:07:45 gathering_group_stats: FALSE cms_only: FALSE force: FALSE options: Z executionPhase: SERIAL (NON-CONCURRENT) incremental: FALSE concurrent: FALSE loc_degree:  loc_method_opt: for columns size 1 id for columns size 254 almost_null fxt_typ: 0 tobjn: 72921
DBMS_STATS: Preferences for table CBO_TEST.TEST_STATS
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME                                         -
DBMS_STATS: STATS_RETENTION                                   -
DBMS_STATS: MON_MODS_ALL_UPD_TIME                             -
DBMS_STATS: SNAPSHOT_UPD_TIME                                 -
DBMS_STATS: TRACE                                             - 0
DBMS_STATS: DEBUG                                             - 0
DBMS_STATS: SYS_FLAGS                                         - 0
DBMS_STATS: SPD_RETENTION_WEEKS                               - 53
DBMS_STATS: CASCADE                                           - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT                                  - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE                                            - NULL
DBMS_STATS: METHOD_OPT                                        - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE                                     - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY                                       - AUTO
DBMS_STATS: PUBLISH                                           - TRUE
DBMS_STATS: STALE_PERCENT                                     - 10
DBMS_STATS: APPROXIMATE_NDV                                   - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM                         - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE                        - FALSE
DBMS_STATS: INCREMENTAL                                       - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL                      - TRUE
DBMS_STATS: AUTOSTATS_TARGET                                  - AUTO
DBMS_STATS: CONCURRENT                                        - OFF
DBMS_STATS: JOB_OVERHEAD_PERC                                 - 1
DBMS_STATS: JOB_OVERHEAD                                      - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS                           - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS                        - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS                          - 3
DBMS_STATS: TABLE_CACHED_BLOCKS                               - 1
DBMS_STATS: INCREMENTAL_LEVEL                                 - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS                             - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS                                           - GATHER
DBMS_STATS: GATHER_AUTO                                       - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY                                     - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE                                         - 0
DBMS_STATS: GATHER_SCAN_RATE                                  - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER                    - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS                              - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS                         - 15
DBMS_STATS: ROOT_TRIGGER_PDB                                  - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD                         - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS                        - FALSE
DBMS_STATS: AUTO_TASK_STATUS                                  - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME                            - 3600
DBMS_STATS: AUTO_TASK_INTERVAL                                - 900
DBMS_STATS: reporting_man_log_task: target: "CBO_TEST"."TEST_STATS" objn: 72921 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: GLOBAL AND PARTITION global_requested: NULL pfix:
DBMS_STATS: (Baseline)
DBMS_STATS: STATISTIC                      LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized                    0            0
DBMS_STATS: DML Parallelized                        0            0
DBMS_STATS: DDL Parallelized                        0            0
DBMS_STATS: DFO Trees                               0            0
DBMS_STATS: Server Threads                          0            0
DBMS_STATS: Allocation Height                       0            0
DBMS_STATS: Allocation Width                        0            0
DBMS_STATS: Local Msgs Sent                         0            0
DBMS_STATS: Distr Msgs Sent                         0            0
DBMS_STATS: Local Msgs Recv'd                       0            0
DBMS_STATS: Distr Msgs Recv'd                       0            0
DBMS_STATS: DOP                                     0            0
DBMS_STATS: Slave Sets                              0            0
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix:  ownname: CBO_TEST tabname: TEST_STATS pname:  spname:  execution phase: 1
Specified DOP=1 blocks=1781 DOP used=1
Specified DOP=1 blocks=1781 DOP used=1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y    Y    Y                                       Y    ID
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y                        Y    ALMOST_NULL
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Approximate NDV Options
DBMS_STATS: NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U
DBMS_STATS: Starting query at 19.08.19 17:07:45,448000000 +02:00
DBMS_STATS: select /*+  full(t)    no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */to_char(count("ID")),substrb(dump(min("ID"),16,0,64),1,240),substrb(dump(max("ID"),16,0,64),1,240),to_char(count("ALMOST_NULL")),substrb(dump(min("ALMOST_NULL"),16,0,64),1,240),substrb(dump(max("ALMOST_NULL"),16,0,64),1,240),count(rowidtochar(rowid)) from "CBO_TEST"."TEST_STATS" t  /* NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U*/
DBMS_STATS: Ending query at 19.08.19 17:07:45,651000000 +02:00
DBMS_STATS: STATISTIC                      LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized                    0            0
DBMS_STATS: DML Parallelized                        0            0
DBMS_STATS: DDL Parallelized                        0            0
DBMS_STATS: DFO Trees                               0            0
DBMS_STATS: Server Threads                          0            0
DBMS_STATS: Allocation Height                       0            0
DBMS_STATS: Allocation Width                        0            0
DBMS_STATS: Local Msgs Sent                         0            0
DBMS_STATS: Distr Msgs Sent                         0            0
DBMS_STATS: Local Msgs Recv'd                       0            0
DBMS_STATS: Distr Msgs Recv'd                       0            0
DBMS_STATS: DOP                                     0            0
DBMS_STATS: Slave Sets                              0            0
DBMS_STATS: Approximate NDV Result
DBMS_STATS: 
  
    0
    1000000
    1000000
    6
    1001984
    1000000
    3979802
  
  
    1
    Typ=2 Len=2: c1,2
  
  
    2
    Typ=2 Len=2: c4,2
  
  
    3
    999997
    0
    11
    999997
    1899998
    11

AAARzZAAPAAAAVaAFp,100000,AAARzZAAPAAAANAAAb,100000,AAARzZAAPAAAAKMAEf,100000,AAARzZAAPAAAAHYAIj,100000,AAARzZAAPAAAASnABA,100000,AAARzZAAPAAAAbBAGO,100000,AAARzZAAPAAAAYOABl,100000,AAARzZAAPAAAAElAEP,100000,AAARzZAAPAAAAPzAFE,100000,AAARzZAAPAAAACBAAD,99996,AAARzZAAPAAAAd1ACK,1,
  
  
    4
    Typ=2 Len=1: 80
  
  
    5
    Typ=2 Len=2: c1,b
  


DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min:
DBMS_STATS:          max:
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    NULL      NULL      NULL      NULL      NULL      NULL      NULL      7         0         0
DBMS_STATS:          min:
DBMS_STATS:          max: Typ=2 Len=2: c1,b
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    NULL      NULL      NULL      NULL      NULL      NULL      NULL      7         0         0
DBMS_STATS:          min: Typ=2 Len=1: 80
DBMS_STATS:          max: Typ=2 Len=2: c1,b
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max:
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c4,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    1899998   999997    NULL      NULL      NULL      11        NULL      7         0         0
DBMS_STATS:          min: Typ=2 Len=1: 80
DBMS_STATS:          max: Typ=2 Len=2: c1,b
DBMS_STATS: start processing top n values for column "ALMOST_NULL"
DBMS_STATS: Parsing topn values..
DBMS_STATS:    Extracted 11 rowid-freq pairs.
DBMS_STATS: topn sql (len: 744):
DBMS_STATS: +++ select /*+  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */ substrb(dump("ALMOST_NULL",16,0,64),1,240) d_1773304655, rowidtochar(rowid) rwid from "CBO_TEST"."TEST_STATS" t where rowid in (chartorowid('AAARzZAAPAAAACBAAD'),chartorowid('AAARzZAAPAAAAElAEP'),chartorowid('AAARzZAAPAAAAHYAIj'),chartorowid('AAARzZAAPAAAAKMAEf'),chartorowid('AAARzZAAPAAAANAAAb'),chartorowid('AAARzZAAPAAAAPzAFE'),chartorowid('AAARzZAAPAAAASnABA'),chartorowid('AAARzZAAPAAAAVaAFp'),chartorowid('AAARzZAAPAAAAYOABl'),chartorowid('AAARzZAAPAAAAbBAGO'),chartorowid('AAARzZAAPAAAAd1ACK')) order by "ALMOST_NULL"
DBMS_STATS: start executing top n value query..
DBMS_STATS: Starting query at 19.08.19 17:07:45,667000000 +02:00
DBMS_STATS: select /*+  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */ substrb(dump("ALMOST_NULL",16,0,64),1,240) d_1773304655, rowidtochar(rowid) rwid from "CBO_TEST"."TEST_STATS" t where rowid in (chartorowid('AAARzZAAPAAAACBAAD'),chartorowid('AAARzZAAPAAAAElAEP'),chartorowid('AAARzZAAPAAAAHYAIj'),chartorowid('AAARzZAAPAAAAKMAEf'),chartorowid('AAARzZAAPAAAANAAAb'),chartorowid('AAARzZAAPAAAAPzAFE'),chartorowid('AAARzZAAPAAAASnABA'),chartorowid('AAARzZAAPAAAAVaAFp'),chartorowid('AAARzZAAPAAAAYOABl'),chartorowid('AAARzZAAPAAAAbBAGO'),chartorowid('AAARzZAAPAAAAd1ACK')) order by "ALMOST_NULL"
DBMS_STATS: removal_count: -243 total_nonnull_rows: 999997 mnb:  254
DBMS_STATS: hist_type in exec_get_topn: 1024 ndv:11 mnb:254
DBMS_STATS: Evaluating frequency histogram for col: "ALMOST_NULL"
DBMS_STATS:  number of values = 11, max # of buckects = 254, pct = 100, ssize = 999997
DBMS_STATS:   csr.hreq: 1  Histogram gathering flags: 1031
DBMS_STATS: done_hist in process_topn: TRUE  csr.ccnt: 1
DBMS_STATS: Mark column "ALMOST_NULL" as top N computed
DBMS_STATS:  no histogram: setting density to 1/ndv (,000001)
DBMS_STATS: Skip topn computed column "ALMOST_NULL" numhist: 0
DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE
DBMS_STATS: ====================================================================================================
DBMS_STATS: Statistics from clist for CBO_TEST.TEST_STATS:
DBMS_STATS: ====================================================================================================
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             4,979802  1000000   0         1000000   1000000   1000000   ,000001   0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c4,2
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    2,899998  999997    3         999997    11        11        ,00000050 1286      1         1
DBMS_STATS:          min: Typ=2 Len=1: 80
DBMS_STATS:          max: Typ=2 Len=2: c1,b
DBMS_STATS:          Histograms:
DBMS_STATS:          ---------------------------------------------------------------------------------------------------
DBMS_STATS:          BVAL   RPCNT    EAVAL                                                            ENVAL    EDVAL
DBMS_STATS:          ---------------------------------------------------------------------------------------------------
DBMS_STATS:          99996           80                                                               0        Typ=2 Len=1: 80
DBMS_STATS:          199996          C102                                                             1        Typ=2 Len=2: c1,2
DBMS_STATS:          299996          C103                                                             2        Typ=2 Len=2: c1,3
DBMS_STATS:          399996          C104                                                             3        Typ=2 Len=2: c1,4
DBMS_STATS:          499996          C105                                                             4        Typ=2 Len=2: c1,5
DBMS_STATS:          599996          C106                                                             5        Typ=2 Len=2: c1,6
DBMS_STATS:          699996          C107                                                             6        Typ=2 Len=2: c1,7
DBMS_STATS:          799996          C108                                                             7        Typ=2 Len=2: c1,8
DBMS_STATS:          899996          C109                                                             8        Typ=2 Len=2: c1,9
DBMS_STATS:          999996          C10A                                                             9        Typ=2 Len=2: c1,a
DBMS_STATS:          999997          C10B                                                             10       Typ=2 Len=2: c1,b
DBMS_STATS:          Need Actual Values (DSC_EAVS)
DBMS_STATS:          Histogram Type: FREQUENCY Data Type: 2
DBMS_STATS:          Histogram Flags: 4100 Histogram Gathering Flags: 1286
DBMS_STATS:          Incremental: FALSE Fix Control 13583722: 1
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Number of rows in the table = 1000000, blocks = , average row length = 8, chain count = , scan rate = 0, sample size = 1000000, cstats.count = 13, cind = 13
DBMS_STATS: prepare reporting structures...
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ID part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ALMOST_NULL part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: End of construct analyze using sql.
DBMS_STATS: Finished table CBO_TEST.TEST_STATS. at 19.08.19 17:07:45,683000000 +02:00
DBMS_STATS: reporting_man_update_task: objn: 72921 auto_stats: FALSE status: COMPLETED ctx.batching_coeff: 0

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>
SQL> -- Testcase 4: Three distinct values in three rows and all other rows have NULL
SQL> -- This scenario extracts the values directly from the ROWIDs returned from the main query, no separate query
SQL> -- Applies to two distinct values, too
SQL> -- So a single distinct value looks like a special case that triggers a separate query
SQL> -- If this is with combination of almost all rows having NULLs this query doesn't use sampling
SQL> -- ! Big threat if the table is big !
SQL> drop table test_stats purge;

Tabelle wurde geloscht.

SQL>
SQL> create table test_stats
  2  as
  3  select /*+ NO_GATHER_OPTIMIZER_STATISTICS */
  4  rownum as id,
  5  case when rownum <= 2 then rownum else null end as almost_null
  6  from
  7  (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) a,
  8  (select /*+ cardinality(1000) */ null from dual connect by level <= 1000) b
  9  ;

Tabelle wurde erstellt.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'test_stats', method_opt => 'for columns size 1 id for columns size 254 almost_null')
DBMS_STATS: Record gather table stats operation on table : test_stats
DBMS_STATS:   job name:
DBMS_STATS:    |--> Operation id: 226
DBMS_STATS: gather_table_stats: 
DBMS_STATS: Start gather table stats -- tabname: test_stats
DBMS_STATS:   job name:
DBMS_STATS: Started table CBO_TEST.TEST_STATS. at 19.08.19 17:07:46,276000000 +02:00. Execution phase: SERIAL (NON-CONCURRENT) stattype: DATA Reporting Mode: FALSE
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix:  (parent pfix: ) ownname:  tabname: test_stats partname:  estimate_percent: 101 block_sample: FALSE method_opt: for columns size 1 id for columns size 254 almost_null degree: 32766 granularity: Z gIdxGran:  cascade: NULL stattab:  statid:  statown:  no_invalidate: NULL flush_colu: TRUE fxt:  stattype: DATA start_time: 08-19-2019 17:07:46 gathering_group_stats: FALSE cms_only: FALSE force: FALSE options: Z executionPhase: SERIAL (NON-CONCURRENT) incremental: FALSE concurrent: FALSE loc_degree:  loc_method_opt: for columns size 1 id for columns size 254 almost_null fxt_typ: 0 tobjn: 72922
DBMS_STATS: Preferences for table CBO_TEST.TEST_STATS
DBMS_STATS: ================================================================================
DBMS_STATS: SKIP_TIME                                         -
DBMS_STATS: STATS_RETENTION                                   -
DBMS_STATS: MON_MODS_ALL_UPD_TIME                             -
DBMS_STATS: SNAPSHOT_UPD_TIME                                 -
DBMS_STATS: TRACE                                             - 0
DBMS_STATS: DEBUG                                             - 0
DBMS_STATS: SYS_FLAGS                                         - 0
DBMS_STATS: SPD_RETENTION_WEEKS                               - 53
DBMS_STATS: CASCADE                                           - DBMS_STATS.AUTO_CASCADE
DBMS_STATS: ESTIMATE_PERCENT                                  - DBMS_STATS.AUTO_SAMPLE_SIZE
DBMS_STATS: DEGREE                                            - NULL
DBMS_STATS: METHOD_OPT                                        - FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: NO_INVALIDATE                                     - DBMS_STATS.AUTO_INVALIDATE
DBMS_STATS: GRANULARITY                                       - AUTO
DBMS_STATS: PUBLISH                                           - TRUE
DBMS_STATS: STALE_PERCENT                                     - 10
DBMS_STATS: APPROXIMATE_NDV                                   - TRUE
DBMS_STATS: APPROXIMATE_NDV_ALGORITHM                         - REPEAT OR HYPERLOGLOG
DBMS_STATS: ANDV_ALGO_INTERNAL_OBSERVE                        - FALSE
DBMS_STATS: INCREMENTAL                                       - FALSE
DBMS_STATS: INCREMENTAL_INTERNAL_CONTROL                      - TRUE
DBMS_STATS: AUTOSTATS_TARGET                                  - AUTO
DBMS_STATS: CONCURRENT                                        - OFF
DBMS_STATS: JOB_OVERHEAD_PERC                                 - 1
DBMS_STATS: JOB_OVERHEAD                                      - -1
DBMS_STATS: GLOBAL_TEMP_TABLE_STATS                           - SESSION
DBMS_STATS: ENABLE_TOP_FREQ_HISTOGRAMS                        - 3
DBMS_STATS: ENABLE_HYBRID_HISTOGRAMS                          - 3
DBMS_STATS: TABLE_CACHED_BLOCKS                               - 1
DBMS_STATS: INCREMENTAL_LEVEL                                 - PARTITION
DBMS_STATS: INCREMENTAL_STALENESS                             - ALLOW_MIXED_FORMAT
DBMS_STATS: OPTIONS                                           - GATHER
DBMS_STATS: GATHER_AUTO                                       - AFTER_LOAD
DBMS_STATS: STAT_CATEGORY                                     - OBJECT_STATS, REALTIME_STATS
DBMS_STATS: SCAN_RATE                                         - 0
DBMS_STATS: GATHER_SCAN_RATE                                  - HADOOP_ONLY
DBMS_STATS: PREFERENCE_OVERRIDES_PARAMETER                    - FALSE
DBMS_STATS: AUTO_STAT_EXTENSIONS                              - OFF
DBMS_STATS: WAIT_TIME_TO_UPDATE_STATS                         - 15
DBMS_STATS: ROOT_TRIGGER_PDB                                  - FALSE
DBMS_STATS: COORDINATOR_TRIGGER_SHARD                         - FALSE
DBMS_STATS: MAINTAIN_STATISTICS_STATUS                        - FALSE
DBMS_STATS: AUTO_TASK_STATUS                                  - OFF
DBMS_STATS: AUTO_TASK_MAX_RUN_TIME                            - 3600
DBMS_STATS: AUTO_TASK_INTERVAL                                - 900
DBMS_STATS: reporting_man_log_task: target: "CBO_TEST"."TEST_STATS" objn: 72922 auto_stats: FALSE status: IN PROGRESS ctx.batching_coeff: 0
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: SERIAL (NON-CONCURRENT) granularity: GLOBAL AND PARTITION global_requested: NULL pfix:
DBMS_STATS: (Baseline)
DBMS_STATS: STATISTIC                      LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized                    0            0
DBMS_STATS: DML Parallelized                        0            0
DBMS_STATS: DDL Parallelized                        0            0
DBMS_STATS: DFO Trees                               0            0
DBMS_STATS: Server Threads                          0            0
DBMS_STATS: Allocation Height                       0            0
DBMS_STATS: Allocation Width                        0            0
DBMS_STATS: Local Msgs Sent                         0            0
DBMS_STATS: Distr Msgs Sent                         0            0
DBMS_STATS: Local Msgs Recv'd                       0            0
DBMS_STATS: Distr Msgs Recv'd                       0            0
DBMS_STATS: DOP                                     0            0
DBMS_STATS: Slave Sets                              0            0
DBMS_STATS: Start gather part -- ctx.conc_ctx.global_requested: NULL gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Start gather_stats.. pfix:  ownname: CBO_TEST tabname: TEST_STATS pname:  spname:  execution phase: 1
Specified DOP=1 blocks=1520 DOP used=1
Specified DOP=1 blocks=1520 DOP used=1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y    Y    Y                                       Y    ID
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y                        Y    ALMOST_NULL
DBMS_STATS: APPROX_NDV_ALGORITHM chosen: AS in non-incremental
DBMS_STATS: Approximate NDV Options
DBMS_STATS: NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U
DBMS_STATS: Starting query at 19.08.19 17:07:46,292000000 +02:00
DBMS_STATS: select /*+  full(t)    no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */to_char(count("ID")),substrb(dump(min("ID"),16,0,64),1,240),substrb(dump(max("ID"),16,0,64),1,240),to_char(count("ALMOST_NULL")),substrb(dump(min("ALMOST_NULL"),16,0,64),1,240),substrb(dump(max("ALMOST_NULL"),16,0,64),1,240),count(rowidtochar(rowid)) from "CBO_TEST"."TEST_STATS" t  /* NDV,NIL,NIL,TOPN,NIL,NIL,RWID,U1,U254U*/
DBMS_STATS: Ending query at 19.08.19 17:07:46,448000000 +02:00
DBMS_STATS: STATISTIC                      LAST_QUERY SESSION_TOTAL
DBMS_STATS: ------------------------------ ---------- -------------
DBMS_STATS: Queries Parallelized                    0            0
DBMS_STATS: DML Parallelized                        0            0
DBMS_STATS: DDL Parallelized                        0            0
DBMS_STATS: DFO Trees                               0            0
DBMS_STATS: Server Threads                          0            0
DBMS_STATS: Allocation Height                       0            0
DBMS_STATS: Allocation Width                        0            0
DBMS_STATS: Local Msgs Sent                         0            0
DBMS_STATS: Distr Msgs Sent                         0            0
DBMS_STATS: Local Msgs Recv'd                       0            0
DBMS_STATS: Distr Msgs Recv'd                       0            0
DBMS_STATS: DOP                                     0            0
DBMS_STATS: Slave Sets                              0            0
DBMS_STATS: Approximate NDV Result
DBMS_STATS: 
  
    0
    1000000
    1000000
    6
    1001984
    1000000
    3979802
  
  
    1
    Typ=2 Len=2: c1,2
  
  
    2
    Typ=2 Len=2: c4,2
  
  
    3
    2
    0
    2
    2
    4
    2
    AAARzaAAPAAAACBAAB,1,AAARzaAAPAAAACBAAA,1,
  
  
    4
    Typ=2 Len=2: c1,2


  
    5
    Typ=2 Len=2: c1,3
  


DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min:
DBMS_STATS:          max:
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    NULL      NULL      NULL      NULL      NULL      NULL      NULL      7         0         0
DBMS_STATS:          min:
DBMS_STATS:          max: Typ=2 Len=2: c1,3
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    NULL      NULL      NULL      NULL      NULL      NULL      NULL      7         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c1,3
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max:
DBMS_STATS: process_raw_crec() for 1
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             3979802   1000000   NULL      NULL      NULL      1001984   NULL      0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c4,2
DBMS_STATS: process_raw_crec() for 2
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    4         2         NULL      NULL      NULL      2         NULL      7         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c1,3
DBMS_STATS: start processing top n values for column "ALMOST_NULL"
DBMS_STATS: Parsing topn values..
DBMS_STATS:    Extracted 2 rowid-freq pairs.
DBMS_STATS: topn sql (len: 438):
DBMS_STATS: +++ select /*+  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */ substrb(dump("ALMOST_NULL",16,0,64),1,240) d_1773304655, rowidtochar(rowid) rwid from "CBO_TEST"."TEST_STATS" t where rowid in (chartorowid('AAARzaAAPAAAACBAAA'),chartorowid('AAARzaAAPAAAACBAAB')) order by "ALMOST_NULL"
DBMS_STATS: start executing top n value query..
DBMS_STATS: Starting query at 19.08.19 17:07:46,464000000 +02:00
DBMS_STATS: select /*+  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  */ substrb(dump("ALMOST_NULL",16,0,64),1,240) d_1773304655, rowidtochar(rowid) rwid from "CBO_TEST"."TEST_STATS" t where rowid in (chartorowid('AAARzaAAPAAAACBAAA'),chartorowid('AAARzaAAPAAAACBAAB')) order by "ALMOST_NULL"
DBMS_STATS: removal_count: -252 total_nonnull_rows: 2 mnb:  254
DBMS_STATS: hist_type in exec_get_topn: 1024 ndv:2 mnb:254
DBMS_STATS: Evaluating frequency histogram for col: "ALMOST_NULL"
DBMS_STATS:  number of values = 2, max # of buckects = 254, pct = 100, ssize = 2
DBMS_STATS:   csr.hreq: 1  Histogram gathering flags: 1031
DBMS_STATS: done_hist in process_topn: TRUE  csr.ccnt: 1
DBMS_STATS: Mark column "ALMOST_NULL" as top N computed
DBMS_STATS:  no histogram: setting density to 1/ndv (,000001)
DBMS_STATS: Skip topn computed column "ALMOST_NULL" numhist: 0
DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE
DBMS_STATS: ====================================================================================================
DBMS_STATS: Statistics from clist for CBO_TEST.TEST_STATS:
DBMS_STATS: ====================================================================================================
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID                             4,979802  1000000   0         1000000   1000000   1000000   ,000001   0         0         0
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c4,2
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME                       ACL       SNNV      NV        SNNVDV    NDV       SNDV      DENS      HGATH     HIND      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ALMOST_NULL                    1,000004  2         999998    2         2         2         ,25       1286      1         1
DBMS_STATS:          min: Typ=2 Len=2: c1,2
DBMS_STATS:          max: Typ=2 Len=2: c1,3
DBMS_STATS:          Histograms:
DBMS_STATS:          ---------------------------------------------------------------------------------------------------
DBMS_STATS:          BVAL   RPCNT    EAVAL                                                            ENVAL    EDVAL
DBMS_STATS:          ---------------------------------------------------------------------------------------------------
DBMS_STATS:          1               C102                                                             1        Typ=2 Len=2: c1,2
DBMS_STATS:          2               C103                                                             2        Typ=2 Len=2: c1,3
DBMS_STATS:          Need Actual Values (DSC_EAVS)
DBMS_STATS:          Histogram Type: FREQUENCY Data Type: 2
DBMS_STATS:          Histogram Flags: 4100 Histogram Gathering Flags: 1286
DBMS_STATS:          Incremental: FALSE Fix Control 13583722: 1
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Number of rows in the table = 1000000, blocks = , average row length = 6, chain count = , scan rate = 0, sample size = 1000000, cstats.count = 4, cind = 4
DBMS_STATS: prepare reporting structures...
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ID part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: gather_ext_col_stats 1: tbl_owner=CBO_TEST tbl_name=TEST_STATS col_name=ALMOST_NULL part_name= granularity=GLOBAL AND PARTITION
DBMS_STATS: End of construct analyze using sql.
DBMS_STATS: Finished table CBO_TEST.TEST_STATS. at 19.08.19 17:07:46,464000000 +02:00
DBMS_STATS: reporting_man_update_task: objn: 72922 auto_stats: FALSE status: COMPLETED ctx.batching_coeff: 0

PL/SQL-Prozedur erfolgreich abgeschlossen.

SQL>


So this test case shows that in case of a column having more than a single value the code happily extracts the required information using the ROWIDs collected during the main pass and doesn't require a separate pass (Test cases 3 and 4), but in the special case of a single valued column it bails out and runs a separate query typically using sampling, except for the edge case of a single valued column having NULLs in most of the rows (Test cases 1 and 2).

I've discussed this issue with Nigel Bayliss, the optimizer product manager at Oracle, and he agreed that it looks like a bug - and created a corresponding bug (Bug 30205756 - FREQUENCY HISTOGRAM NOT CREATED USING FULL ROW SAMPLE, which is probably not marked as public), just in case you come across this issue and want to provide this reference to Oracle Support.