Wednesday, May 22, 2019

Indexing Null Values - Part 2

In the previous post I've demonstrated that Oracle has some problems to make efficient use of B*Tree indexes if an IS NULL condition is followed by IN / OR predicates also covered by the same index - the predicates following are not used to navigate the index structure efficiently but are applied as filters on all index entries identified by the IS NULL.

In this part I'll show what results I got when repeating the same exercise using Bitmap indexes - after all they include NULL values anyway, so no special tricks are required to use them for an IS NULL search. Let's start again with the same data set (actually not exactly the same but very similar) and an index on the single expression that gets searched for via IS NULL - results are again from 18.3.0:

SQL> create table null_index as select * from dba_tables;

Table created.

SQL> insert /*+ append */ into null_index select a.* from null_index a, (select /*+ no_merge cardinality(100) */ rownum as dup from dual connect by level <= 100);

214500 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(null, 'NULL_INDEX', method_opt => 'for all columns size auto for columns size 254 owner')

PL/SQL procedure successfully completed.

SQL> create bitmap index null_index_idx on null_index (pct_free);

Index created.

SQL> set serveroutput off pagesize 5000 arraysize 500
SQL>
SQL> set autotrace traceonly
SQL>
SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1297049223

------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |    19 |  5852 |  2342   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX     |    19 |  5852 |  2342   (1)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |                |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE        | NULL_INDEX_IDX |       |       |            |          |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
   3 - access("PCT_FREE" IS NULL)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2192  consistent gets
         30  physical reads
          0  redo size
       7199  bytes sent via SQL*Net to client
        372  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        101  rows processed

So indeed the Bitmap index was successfully used to identify the PCT_FREE IS NULL rows but the efficiency suffers from the same problem and to the same degree as the corresponding B*Tree index plan - too many rows have to be filtered on table level:

Plan hash value: 1297049223

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |      1 |        |  2342 (100)|    101 |00:00:00.01 |    2192 |     30 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX     |      1 |     19 |  2342   (1)|    101 |00:00:00.01 |    2192 |     30 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |                |      1 |        |            |  13433 |00:00:00.01 |       3 |     30 |
|*  3 |    BITMAP INDEX SINGLE VALUE        | NULL_INDEX_IDX |      1 |        |            |      3 |00:00:00.01 |       3 |     30 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
   3 - access("PCT_FREE" IS NULL)

Some interesting points to mention: The 13,000+ rows are identified in the Bitmap index using just three index row entries / bitmap fragments, so that's the special efficiency of Bitmap indexes where a single index row entry can cover many, many table rows, and it's also interesting to see that the costing is pretty different from the B*Tree index costing (2342 vs. 1028, in this case closer to reality of 2,200 consistent gets but we'll see in a moment how this can change) - and no cardinality estimate gets mentioned on Bitmap index level  - the B*Tree index plan showed the spot on 13,433 estimated rows.

So reproducing the B*Tree test case, let's add the OWNER column to the Bitmap index in an attempt to increase the efficiency. Note that I drop the previous index to prevent Oracle from a "proper" usage of Bitmap indexes, as we'll see in a moment:

SQL> drop index null_index_idx;

Index dropped.

SQL> create bitmap index null_index_idx2 on null_index (pct_free, owner);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1751956722

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |    19 |  5852 |  2343   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX      |    19 |  5852 |  2343   (1)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |                 |       |       |            |          |
|*  3 |    BITMAP INDEX RANGE SCAN          | NULL_INDEX_IDX2 |       |       |            |          |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
   3 - access("PCT_FREE" IS NULL)
       filter("PCT_FREE" IS NULL AND ("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        105  consistent gets
         30  physical reads
          0  redo size
       7199  bytes sent via SQL*Net to client
        372  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        101  rows processed

Plan hash value: 1751956722

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |  2343 (100)|    101 |00:00:00.01 |     105 |     30 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX      |      1 |     19 |  2343   (1)|    101 |00:00:00.01 |     105 |     30 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |                 |      1 |        |            |    101 |00:00:00.01 |       4 |     30 |
|*  3 |    BITMAP INDEX RANGE SCAN          | NULL_INDEX_IDX2 |      1 |        |            |      1 |00:00:00.01 |       4 |     30 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
   3 - access("PCT_FREE" IS NULL)
       filter(("PCT_FREE" IS NULL AND ("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')))

So now we end up with an "Bitmap index range scan" operation, which in reality looks pretty efficient - just 105 consistent gets, so assuming 101 consistent gets for accessing the 101 table rows it just required 4 consistent gets on index level. But then look at the cost estimate: 2343, which is even greater than the cost estimate of the previous plan, and also check the "Predicate Information" section, which looks pretty weird, too - an access only for PCT_FREE IS NULL, a filter on index level repeating the whole predicates including the PCT_FREE IS NULL and most significantly the predicates on OWNER repeated on table level.

Clearly what the optimizer assumes in terms of costing and predicates required doesn't correspond to what happens at runtime, which looks pretty efficient, but at least according the predicates on index level again doesn't look like the optimal strategy we would like to see again: Why the additional filter instead of just access? We can also see that echoed in the Rowsource statistics: Only a single Bitmap index fragment gets produced by the "Bitmap index range scan" but it requires 4 consistent gets on index level, so three of them get "filtered" after access.

The costing seems to assume that only the PCT_FREE IS NULL rows are identified on index level, which clearly isn't the case at runtime...

Of course this is not proper usage of Bitmap indexes - typically you don't create a multi column Bitmap index but instead make use of the real power of Bitmap indexes, which is how Oracle can combine multiple of them for efficient usage and access.

Before doing so, let's just for the sake of completeness repeat the combined Bitmap index of the B*Tree variant that turned out to be most efficient for the B*Tree case:

SQL> drop index null_index_idx2;

Index dropped.

SQL> create bitmap index null_index_idx3 on null_index (owner, pct_free);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1022155563

--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                 |    19 |  5852 |    83   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                 |       |       |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX      |    19 |  5852 |    83   (0)| 00:00:01 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |                 |       |       |            |          |
|*  4 |     BITMAP INDEX RANGE SCAN          | NULL_INDEX_IDX3 |       |       |            |          |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("PCT_FREE" IS NULL)
   4 - access("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')
       filter("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        207  consistent gets
         30  physical reads
          0  redo size
       7199  bytes sent via SQL*Net to client
        372  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        101  rows processed

Plan hash value: 1022155563

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                 |      1 |        |    83 (100)|    101 |00:00:00.01 |     207 |     30 |
|   1 |  INLIST ITERATOR                     |                 |      1 |        |            |    101 |00:00:00.01 |     207 |     30 |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX      |      2 |     19 |    83   (0)|    101 |00:00:00.01 |     207 |     30 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |                 |      2 |        |            |    303 |00:00:00.01 |       5 |     30 |
|*  4 |     BITMAP INDEX RANGE SCAN          | NULL_INDEX_IDX3 |      2 |        |            |      2 |00:00:00.01 |       5 |     30 |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("PCT_FREE" IS NULL)
   4 - access(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))
       filter(("OWNER"='AUDSYS' OR "OWNER"='CBO_TEST'))

While we see now again the desired "INLIST ITERATOR" this one looks weird for several reasons, in particular because we now have a much lower cost estimate (83) but in reality it is less efficient than the previous one (cost estimate 2343 but 105 consistent gets) due to the 207 consistent gets required. Why is this so? The "Predicate Information" section shows why: Only the predicate on OWNER is evaluated on index level (303 rows identified on index level) and therefore rows need to be filtered on table level, which looks again like an implementation limitation and pretty unnecessary - after all the PCT_FREE IS NULL should be somehow treated on index level instead.

So finally let's see how things turn out when using Bitmap indexes the way they are designed - by creating multiple ones and let Oracle combine them:

SQL> create bitmap index null_index_idx on null_index (pct_free);

Index created.

SQL> create bitmap index null_index_idx4 on null_index (owner);

Index created.

SQL> select * from null_index where pct_free is null and owner in ('AUDSYS', 'CBO_TEST');

101 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 704944303

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |    19 |  5852 |     8   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX      |    19 |  5852 |     8   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |                 |       |       |            |          |
|   3 |    BITMAP AND                       |                 |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE       | NULL_INDEX_IDX  |       |       |            |          |
|   5 |     BITMAP OR                       |                 |       |       |            |          |
|*  6 |      BITMAP INDEX SINGLE VALUE      | NULL_INDEX_IDX4 |       |       |            |          |
|*  7 |      BITMAP INDEX SINGLE VALUE      | NULL_INDEX_IDX4 |       |       |            |          |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PCT_FREE" IS NULL)
   6 - access("OWNER"='AUDSYS')
   7 - access("OWNER"='CBO_TEST')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        108  consistent gets
         60  physical reads
          0  redo size
      33646  bytes sent via SQL*Net to client
        372  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        101  rows processed

Plan hash value: 704944303

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |     8 (100)|    101 |00:00:00.01 |     108 |     60 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| NULL_INDEX      |      1 |     19 |     8   (0)|    101 |00:00:00.01 |     108 |     60 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |                 |      1 |        |            |    101 |00:00:00.01 |       7 |     60 |
|   3 |    BITMAP AND                       |                 |      1 |        |            |      1 |00:00:00.01 |       7 |     60 |
|*  4 |     BITMAP INDEX SINGLE VALUE       | NULL_INDEX_IDX  |      1 |        |            |      3 |00:00:00.01 |       3 |     30 |
|   5 |     BITMAP OR                       |                 |      1 |        |            |      1 |00:00:00.01 |       4 |     30 |
|*  6 |      BITMAP INDEX SINGLE VALUE      | NULL_INDEX_IDX4 |      1 |        |            |      1 |00:00:00.01 |       2 |     30 |
|*  7 |      BITMAP INDEX SINGLE VALUE      | NULL_INDEX_IDX4 |      1 |        |            |      1 |00:00:00.01 |       2 |      0 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PCT_FREE" IS NULL)
   6 - access("OWNER"='AUDSYS')
   7 - access("OWNER"='CBO_TEST')

So now we see access predicates only and Oracle making efficient use by combining multiple Bitmap indexes. Nevertheless I find the range of costing amazing: This plan is assigned a cost of 8 but it's actually less efficient at runtime (108 consistent gets) than the plan above having a cost of 2343 assigned but requiring just 105 consistent gets at runtime. Clearly the costing of Bitmap indexes is still - even in version 18.3 - full of surprises.

Summary

Repeating the same exercise as previously using Bitmap indexes shows several things:

- Oracle isn't necessarily good at costing and using multi column Bitmap indexes properly
- The costing of Bitmap indexes is still questionable (the most important figure "Clustering Factor" is still meaningless for Bitmap indexes)
- For proper handling use Bitmap indexes the way they are supposed to be used: By creating separate ones and let Oracle combine them

No comments:

Post a Comment