Friday, January 9, 2009

Limitations of histograms - rounding issues

As outlined in e.g. Jonathan Lewis' "Cost-Based Oracle Fundamentals" on page 117 et seqq., histograms are based only on a subset of column data, e.g. for character columns only the first 32 bytes (not characters in case of multi-byte character sets) are examined and while converting the resulting substring to a number, further truncation takes place: Only the first 15 bytes are used, the remaining ones are padded with zeros, the resulting hex number is converted to decimal and again rounded to 15 significant figures.

Since this means that for single-byte character strings only the first six to seven characters are relevant for the finally resulting number representation (in case of multi-byte character sets it's potentially even less relevant characters) Oracle populates the ENDPOINT_ACTUAL_VALUE using the first 32 bytes to be able to distinguish strings that are the same for the first six to seven bytes.

Recently there was a interesting thread on the OTN forum where it was pointed out that similar issues can arise if you attempt to generate a histogram on numeric data with a very high precision. It turned out that Oracle actually performs a similar rounding with number data although from a technical point of view it's not obvious why a rounding would be required since no conversion is necessary for a column of NUMBER data type; the ENDPOINT_VALUE seems to be a NUMBER type, too, in the data dictionary.

The behaviour although is a bit different between the handling of strings and numbers: Whereas strings that do not differ for the first 32 bytes simply generate a one bucket histogram (so they group to a single value), a number that is the same for the first 15 digits still generates an appropriate number of histogram buckets, but the ENDPOINT_VALUE is the same for all entries. Note that it doesn't matter if the digits are left or right to the decimal point, any number having a precision larger than 15 will get rounded.

The following test case (run on Windows (32bit) 10.2.0.4, but the results on 9.2.0.8 and 11.1.0.7 were the same) shall demonstrate the issues described. It populates a table with suitable data to demonstrate the rounding issues.


SQL>
SQL> alter session set nls_numeric_characters = ',.';

Session altered.

SQL>
SQL> drop table test_histogram_rounding purge;

Table dropped.

SQL>
SQL> create table test_histogram_rounding as
2 select rownum as id,
3 1.00000000000001 + (trunc(sqrt(rownum)) / 100000000000000) as the_number1,
4 1.000000000000001 + (trunc(sqrt(rownum)) / 1000000000000000) as the_number2,
5 10.00000000000001 + (trunc(sqrt(rownum)) / 100000000000000) as the_number3,
6 100.00000000000001 + (trunc(sqrt(rownum)) / 100000000000000) as the_number4,
7 to_char(100.00000000000001 + (trunc(sqrt(rownum)) / 100000000000000), 'TM') as the_number4_as_string,
8 10000000000000001 + trunc(sqrt(rownum)) as the_number5,
9 to_char(1000000000000000000000000000000001 + trunc(sqrt(rownum)), 'TM') as the_number6_as_string,
10 'AAAAA' || trunc(sqrt(rownum)) as the_string1,
11 'AAAAAAAAAA' || trunc(sqrt(rownum)) as the_string2,
12 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' || trunc(sqrt(rownum)) as the_string3
13 -- Use this in AL32UTF8 character set to see the effects of multi-byte character sets
14 --'ÄÄÄÄÄ' || trunc(sqrt(rownum)) as the_string1,
15 --'ÄÄÄÄÄÄÄÄÄÄ' || trunc(sqrt(rownum)) as the_string2,
16 --'ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ' || trunc(sqrt(rownum)) as the_string3
17 from dual
18 connect by level <= 1000;

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'test_histogram_rounding', method_opt=>'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 THE_NUMBER1,THE_NUMBER2,THE_NUMBER3,THE_NUMBER4,THE_NUMBER4_AS_STRING,THE_NUMBER5,THE_NUMBER6_AS_STRING,THE_STRING1,THE_STRING2,THE_STRING3')

PL/SQL procedure successfully completed.

SQL>
SQL> column column_name format a22
SQL> column endpoint_number format 9999
SQL> column endpoint_actual_value format a32
SQL>
SQL> select column_name, endpoint_number, endpoint_value, endpoint_actual_value
2 from user_tab_histograms where table_name = 'TEST_HISTOGRAM_ROUNDING';

COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
---------------------- --------------- -------------------------------------- --------------------------------
THE_NUMBER1 3 1,00000000000002
THE_NUMBER1 8 1,00000000000003
THE_NUMBER1 15 1,00000000000004
THE_NUMBER1 24 1,00000000000005
THE_NUMBER1 35 1,00000000000006
THE_NUMBER1 48 1,00000000000007
THE_NUMBER1 63 1,00000000000008
THE_NUMBER1 80 1,00000000000009
THE_NUMBER1 99 1,0000000000001
THE_NUMBER1 120 1,00000000000011
THE_NUMBER1 143 1,00000000000012
THE_NUMBER1 168 1,00000000000013
THE_NUMBER1 195 1,00000000000014
THE_NUMBER1 224 1,00000000000015
THE_NUMBER1 255 1,00000000000016
THE_NUMBER1 288 1,00000000000017
THE_NUMBER1 323 1,00000000000018
THE_NUMBER1 360 1,00000000000019
THE_NUMBER1 399 1,0000000000002
THE_NUMBER1 440 1,00000000000021
THE_NUMBER1 483 1,00000000000022
THE_NUMBER1 528 1,00000000000023
THE_NUMBER1 575 1,00000000000024
THE_NUMBER1 624 1,00000000000025
THE_NUMBER1 675 1,00000000000026
THE_NUMBER1 728 1,00000000000027
THE_NUMBER1 783 1,00000000000028
THE_NUMBER1 840 1,00000000000029
THE_NUMBER1 899 1,0000000000003
THE_NUMBER1 960 1,00000000000031
THE_NUMBER1 1000 1,00000000000032
THE_NUMBER2 3 1
THE_NUMBER2 8 1
THE_NUMBER2 15 1
THE_NUMBER2 24 1,00000000000001
THE_NUMBER2 35 1,00000000000001
THE_NUMBER2 48 1,00000000000001
THE_NUMBER2 63 1,00000000000001
THE_NUMBER2 80 1,00000000000001
THE_NUMBER2 99 1,00000000000001
THE_NUMBER2 120 1,00000000000001
THE_NUMBER2 143 1,00000000000001
THE_NUMBER2 168 1,00000000000001
THE_NUMBER2 195 1,00000000000001
THE_NUMBER2 224 1,00000000000002
THE_NUMBER2 255 1,00000000000002
THE_NUMBER2 288 1,00000000000002
THE_NUMBER2 323 1,00000000000002
THE_NUMBER2 360 1,00000000000002
THE_NUMBER2 399 1,00000000000002
THE_NUMBER2 440 1,00000000000002
THE_NUMBER2 483 1,00000000000002
THE_NUMBER2 528 1,00000000000002
THE_NUMBER2 575 1,00000000000002
THE_NUMBER2 624 1,00000000000003
THE_NUMBER2 675 1,00000000000003
THE_NUMBER2 728 1,00000000000003
THE_NUMBER2 783 1,00000000000003
THE_NUMBER2 840 1,00000000000003
THE_NUMBER2 899 1,00000000000003
THE_NUMBER2 960 1,00000000000003
THE_NUMBER2 1000 1,00000000000003
THE_NUMBER3 3 10
THE_NUMBER3 8 10
THE_NUMBER3 15 10
THE_NUMBER3 24 10,0000000000001
THE_NUMBER3 35 10,0000000000001
THE_NUMBER3 48 10,0000000000001
THE_NUMBER3 63 10,0000000000001
THE_NUMBER3 80 10,0000000000001
THE_NUMBER3 99 10,0000000000001
THE_NUMBER3 120 10,0000000000001
THE_NUMBER3 143 10,0000000000001
THE_NUMBER3 168 10,0000000000001
THE_NUMBER3 195 10,0000000000001
THE_NUMBER3 224 10,0000000000002
THE_NUMBER3 255 10,0000000000002
THE_NUMBER3 288 10,0000000000002
THE_NUMBER3 323 10,0000000000002
THE_NUMBER3 360 10,0000000000002
THE_NUMBER3 399 10,0000000000002
THE_NUMBER3 440 10,0000000000002
THE_NUMBER3 483 10,0000000000002
THE_NUMBER3 528 10,0000000000002
THE_NUMBER3 575 10,0000000000002
THE_NUMBER3 624 10,0000000000003
THE_NUMBER3 675 10,0000000000003
THE_NUMBER3 728 10,0000000000003
THE_NUMBER3 783 10,0000000000003
THE_NUMBER3 840 10,0000000000003
THE_NUMBER3 899 10,0000000000003
THE_NUMBER3 960 10,0000000000003
THE_NUMBER3 1000 10,0000000000003
THE_NUMBER4 3 100
THE_NUMBER4 8 100
THE_NUMBER4 15 100
THE_NUMBER4 24 100
THE_NUMBER4 35 100
THE_NUMBER4 48 100
THE_NUMBER4 63 100
THE_NUMBER4 80 100
THE_NUMBER4 99 100
THE_NUMBER4 120 100
THE_NUMBER4 143 100
THE_NUMBER4 168 100
THE_NUMBER4 195 100
THE_NUMBER4 224 100
THE_NUMBER4 255 100
THE_NUMBER4 288 100
THE_NUMBER4 323 100
THE_NUMBER4 360 100
THE_NUMBER4 399 100
THE_NUMBER4 440 100
THE_NUMBER4 483 100
THE_NUMBER4 528 100
THE_NUMBER4 575 100
THE_NUMBER4 624 100
THE_NUMBER4 675 100
THE_NUMBER4 728 100
THE_NUMBER4 783 100
THE_NUMBER4 840 100
THE_NUMBER4 899 100
THE_NUMBER4 960 100
THE_NUMBER4 1000 100
THE_NUMBER4_AS_STRING 3 255399918356579000000000000000000000 100,00000000000002
THE_NUMBER4_AS_STRING 8 255399918356579000000000000000000000 100,00000000000003
THE_NUMBER4_AS_STRING 15 255399918356579000000000000000000000 100,00000000000004
THE_NUMBER4_AS_STRING 24 255399918356579000000000000000000000 100,00000000000005
THE_NUMBER4_AS_STRING 35 255399918356579000000000000000000000 100,00000000000006
THE_NUMBER4_AS_STRING 48 255399918356579000000000000000000000 100,00000000000007
THE_NUMBER4_AS_STRING 63 255399918356579000000000000000000000 100,00000000000008
THE_NUMBER4_AS_STRING 80 255399918356579000000000000000000000 100,00000000000009
THE_NUMBER4_AS_STRING 99 255399918356579000000000000000000000 100,0000000000001
THE_NUMBER4_AS_STRING 120 255399918356579000000000000000000000 100,00000000000011
THE_NUMBER4_AS_STRING 143 255399918356579000000000000000000000 100,00000000000012
THE_NUMBER4_AS_STRING 168 255399918356579000000000000000000000 100,00000000000013
THE_NUMBER4_AS_STRING 195 255399918356579000000000000000000000 100,00000000000014
THE_NUMBER4_AS_STRING 224 255399918356579000000000000000000000 100,00000000000015
THE_NUMBER4_AS_STRING 255 255399918356579000000000000000000000 100,00000000000016
THE_NUMBER4_AS_STRING 288 255399918356579000000000000000000000 100,00000000000017
THE_NUMBER4_AS_STRING 323 255399918356579000000000000000000000 100,00000000000018
THE_NUMBER4_AS_STRING 360 255399918356579000000000000000000000 100,00000000000019
THE_NUMBER4_AS_STRING 399 255399918356579000000000000000000000 100,0000000000002
THE_NUMBER4_AS_STRING 440 255399918356579000000000000000000000 100,00000000000021
THE_NUMBER4_AS_STRING 483 255399918356579000000000000000000000 100,00000000000022
THE_NUMBER4_AS_STRING 528 255399918356579000000000000000000000 100,00000000000023
THE_NUMBER4_AS_STRING 575 255399918356579000000000000000000000 100,00000000000024
THE_NUMBER4_AS_STRING 624 255399918356579000000000000000000000 100,00000000000025
THE_NUMBER4_AS_STRING 675 255399918356579000000000000000000000 100,00000000000026
THE_NUMBER4_AS_STRING 728 255399918356579000000000000000000000 100,00000000000027
THE_NUMBER4_AS_STRING 783 255399918356579000000000000000000000 100,00000000000028
THE_NUMBER4_AS_STRING 840 255399918356579000000000000000000000 100,00000000000029
THE_NUMBER4_AS_STRING 899 255399918356579000000000000000000000 100,0000000000003
THE_NUMBER4_AS_STRING 960 255399918356579000000000000000000000 100,00000000000031
THE_NUMBER4_AS_STRING 1000 255399918356579000000000000000000000 100,00000000000032
THE_NUMBER5 3 10000000000000000
THE_NUMBER5 8 10000000000000000
THE_NUMBER5 15 10000000000000000
THE_NUMBER5 24 10000000000000000
THE_NUMBER5 35 10000000000000000
THE_NUMBER5 48 10000000000000000
THE_NUMBER5 63 10000000000000000
THE_NUMBER5 80 10000000000000000
THE_NUMBER5 99 10000000000000000
THE_NUMBER5 120 10000000000000000
THE_NUMBER5 143 10000000000000000
THE_NUMBER5 168 10000000000000000
THE_NUMBER5 195 10000000000000000
THE_NUMBER5 224 10000000000000000
THE_NUMBER5 255 10000000000000000
THE_NUMBER5 288 10000000000000000
THE_NUMBER5 323 10000000000000000
THE_NUMBER5 360 10000000000000000
THE_NUMBER5 399 10000000000000000
THE_NUMBER5 440 10000000000000000
THE_NUMBER5 483 10000000000000000
THE_NUMBER5 528 10000000000000000
THE_NUMBER5 575 10000000000000000
THE_NUMBER5 624 10000000000000000
THE_NUMBER5 675 10000000000000000
THE_NUMBER5 728 10000000000000000
THE_NUMBER5 783 10000000000000000
THE_NUMBER5 840 10000000000000000
THE_NUMBER5 899 10000000000000000
THE_NUMBER5 960 10000000000000000
THE_NUMBER5 1000 10000000000000000
THE_NUMBER6_AS_STRING 1000 255399919594519000000000000000000000 10000000000000000000000000000000
THE_STRING1 3 338822822454902000000000000000000000 AAAAA1
THE_STRING1 24 338822822454903000000000000000000000 AAAAA10
THE_STRING1 47 338822822454903000000000000000000000 AAAAA11
THE_STRING1 72 338822822454903000000000000000000000 AAAAA12
THE_STRING1 99 338822822454903000000000000000000000 AAAAA13
THE_STRING1 128 338822822454903000000000000000000000 AAAAA14
THE_STRING1 159 338822822454903000000000000000000000 AAAAA15
THE_STRING1 192 338822822454903000000000000000000000 AAAAA16
THE_STRING1 227 338822822454903000000000000000000000 AAAAA17
THE_STRING1 264 338822822454903000000000000000000000 AAAAA18
THE_STRING1 303 338822822454903000000000000000000000 AAAAA19
THE_STRING1 308 338822822454907000000000000000000000 AAAAA2
THE_STRING1 349 338822822454907000000000000000000000 AAAAA20
THE_STRING1 392 338822822454907000000000000000000000 AAAAA21
THE_STRING1 437 338822822454907000000000000000000000 AAAAA22
THE_STRING1 484 338822822454907000000000000000000000 AAAAA23
THE_STRING1 533 338822822454907000000000000000000000 AAAAA24
THE_STRING1 584 338822822454907000000000000000000000 AAAAA25
THE_STRING1 637 338822822454908000000000000000000000 AAAAA26
THE_STRING1 692 338822822454908000000000000000000000 AAAAA27
THE_STRING1 749 338822822454908000000000000000000000 AAAAA28
THE_STRING1 808 338822822454908000000000000000000000 AAAAA29
THE_STRING1 815 338822822454911000000000000000000000 AAAAA3
THE_STRING1 876 338822822454912000000000000000000000 AAAAA30
THE_STRING1 916 338822822454912000000000000000000000 AAAAA31
THE_STRING1 925 338822822454916000000000000000000000 AAAAA4
THE_STRING1 936 338822822454921000000000000000000000 AAAAA5
THE_STRING1 949 338822822454925000000000000000000000 AAAAA6
THE_STRING1 964 338822822454930000000000000000000000 AAAAA7
THE_STRING1 981 338822822454935000000000000000000000 AAAAA8
THE_STRING1 1000 338822822454940000000000000000000000 AAAAA9
THE_STRING2 3 338822822454979000000000000000000000 AAAAAAAAAA1
THE_STRING2 24 338822822454979000000000000000000000 AAAAAAAAAA10
THE_STRING2 47 338822822454979000000000000000000000 AAAAAAAAAA11
THE_STRING2 72 338822822454979000000000000000000000 AAAAAAAAAA12
THE_STRING2 99 338822822454979000000000000000000000 AAAAAAAAAA13
THE_STRING2 128 338822822454979000000000000000000000 AAAAAAAAAA14
THE_STRING2 159 338822822454979000000000000000000000 AAAAAAAAAA15
THE_STRING2 192 338822822454979000000000000000000000 AAAAAAAAAA16
THE_STRING2 227 338822822454979000000000000000000000 AAAAAAAAAA17
THE_STRING2 264 338822822454979000000000000000000000 AAAAAAAAAA18
THE_STRING2 303 338822822454979000000000000000000000 AAAAAAAAAA19
THE_STRING2 308 338822822454979000000000000000000000 AAAAAAAAAA2
THE_STRING2 349 338822822454979000000000000000000000 AAAAAAAAAA20
THE_STRING2 392 338822822454979000000000000000000000 AAAAAAAAAA21
THE_STRING2 437 338822822454979000000000000000000000 AAAAAAAAAA22
THE_STRING2 484 338822822454979000000000000000000000 AAAAAAAAAA23
THE_STRING2 533 338822822454979000000000000000000000 AAAAAAAAAA24
THE_STRING2 584 338822822454979000000000000000000000 AAAAAAAAAA25
THE_STRING2 637 338822822454979000000000000000000000 AAAAAAAAAA26
THE_STRING2 692 338822822454979000000000000000000000 AAAAAAAAAA27
THE_STRING2 749 338822822454979000000000000000000000 AAAAAAAAAA28
THE_STRING2 808 338822822454979000000000000000000000 AAAAAAAAAA29
THE_STRING2 815 338822822454979000000000000000000000 AAAAAAAAAA3
THE_STRING2 876 338822822454979000000000000000000000 AAAAAAAAAA30
THE_STRING2 916 338822822454979000000000000000000000 AAAAAAAAAA31
THE_STRING2 925 338822822454979000000000000000000000 AAAAAAAAAA4
THE_STRING2 936 338822822454979000000000000000000000 AAAAAAAAAA5
THE_STRING2 949 338822822454979000000000000000000000 AAAAAAAAAA6
THE_STRING2 964 338822822454979000000000000000000000 AAAAAAAAAA7
THE_STRING2 981 338822822454979000000000000000000000 AAAAAAAAAA8
THE_STRING2 1000 338822822454979000000000000000000000 AAAAAAAAAA9
THE_STRING3 1000 338822822454979000000000000000000000
ID 0 1
ID 1 1000

252 rows selected.

SQL>
SQL> set autotrace on explain
SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number1 = 1.00000000000002;

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 3 | 30 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER1"=1.00000000000002)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number1 = 1.00000000000031;

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 61 | 610 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER1"=1.00000000000031)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number2 = 1.000000000000002;

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 5 | 55 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER2"=1.000000000000002)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number2 = 1.000000000000031;

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 55 | 605 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER2"=1.000000000000031)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number3 = 10.00000000000002;

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 5 | 50 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER3"=10.00000000000002)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number3 = 10.00000000000031;

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 55 | 550 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER3"=10.00000000000031)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number4 = 100.00000000000002;

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 33 | 363 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER4"=100.00000000000002)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number4 = 100.00000000000031;

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 33 | 363 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER4"=100.00000000000031)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number4_as_string = 100.00000000000002;

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 19 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 3 | 57 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_NUMBER("THE_NUMBER4_AS_STRING")=100.00000000000002)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number4_as_string = 100.00000000000031;

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 19 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 61 | 1159 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_NUMBER("THE_NUMBER4_AS_STRING")=100.00000000000031)

SQL>
SQL> alter session set nls_numeric_characters = '.,';

Session altered.

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number4_as_string = 100.00000000000002;
where the_number4_as_string = 100.00000000000002
*
ERROR at line 3:
ORA-01722: invalid number


SQL>
SQL> alter session set nls_numeric_characters = ',.';

Session altered.

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number5 = 10000000000000002;

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 33 | 363 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER5"=10000000000000002)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number5 = 10000000000000031;

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 33 | 363 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_NUMBER5"=10000000000000031)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number6_as_string = 1000000000000000000000000000000002;

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 35 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 1000 | 35000 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_NUMBER("THE_NUMBER6_AS_STRING")=1000000000000000000000000000000002)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_number6_as_string = 1000000000000000000000000000000031;

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 35 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 1000 | 35000 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_NUMBER("THE_NUMBER6_AS_STRING")=1000000000000000000000000000000031)

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_string1 = 'AAAAA1';

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 3 | 24 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_STRING1"='AAAAA1')

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_string1 = 'AAAAA30';

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 61 | 488 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_STRING1"='AAAAA30')

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_string2 = 'AAAAAAAAAA1';

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 3 | 39 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_STRING2"='AAAAAAAAAA1')

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_string2 = 'AAAAAAAAAA30';

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 61 | 793 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_STRING2"='AAAAAAAAAA30')

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_string3 = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1';

COUNT(*)
--------------------------------------
3


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 35 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 1000 | 35000 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_STRING3"='AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1')

SQL>
SQL> select count(*)
2 from test_histogram_rounding
3 where the_string3 = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA30';

COUNT(*)
--------------------------------------
61


Execution Plan
----------------------------------------------------------
Plan hash value: 3690517312

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 13 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 35 | | |
|* 2 | TABLE ACCESS FULL| TEST_HISTOGRAM_ROUNDING | 1000 | 35000 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("THE_STRING3"='AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA30')

SQL>
SQL> spool off


As you can see from the histogram dictionary information, the numeric data gets rounded but the correct number of histogram buckets get generated. The string behaves differently, and you can see the different handling of the ENDPOINT_VALUE and the ENDPOINT_ACTUAL_VALUE.

When the rounding issues kick in for the numbers you start to get (slightly) incorrect estimates based on the histogram information, and once the precision gets greater than 15, the estimates will be significantly incorrect.

As a side note, as you can see from the test case, it's a bad idea to generate a histogram on a string that is the same for at least the 32 first bytes, since you'll get a single bucket and the estimates will be way off.

A potential workaround of the numeric issue is to store the number as string (as long as the precision is less than 32 due to the limitation of the string handling). According to "Cost-Based Oracle Fundamentals" by Jonathan Lewis from 9.2.0.6 and 10.1.0.4 on Oracle will use the histogram information properly, even with the implicit number conversion in place, but I don't have the older versions currently available for testing, so I can't tell the exact outcome when using the older releases.

Of course this workaround introduces various issues, the most obvious threat being the implicit conversion relying on client NLS settings, as can be seen in the script.

1 comment:

  1. Over on Oracle related stuff, Randolf Geist has published a very through look into the limitations of histograms - rounding issues.

    -- Log Buffer #131

    ReplyDelete