Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighb...

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighborhood Threat) September 18, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Explain Plan For Index, Extended Statistics, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle General, Oracle Indexes.
trackback

Following on from my previous few posts on “data skew”, I’m now going to look at it from a slightly different perspective, where there is an inherent relationship between columns. The CBO has difficulties in recognising (by default) that some combinations of column values are far more common than other combinations, resulting in incorrect cardinality estimates and resultant poor execution plans.

As we’ll see, this skew in returned data can lead to poor execution plans due to the inappropriate use of newly created Automatic Indexes…

I’ll start by creating a simple table that has two columns of interest, CODE1 and CODE2:

SQL> create table iggy_pop (id number, code1 number, code2 number, name varchar2(42));

Table created.

SQL> insert into iggy_pop select rownum, mod(rownum, 100)+1, mod(rownum, 100)+1, 'David Bowie'

from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP');

PL/SQL procedure successfully completed.

Both columns CODE1 and CODE2 each have 100 distinct values, so that the possible combinations of data from both columns is 100 x 100 = 10,000. HOWEVER, the values of CODE1 and CODE2 are always the same and so there is in fact only 100 distinct combinations of data because of this inherent relationship between columns.

If we run the following query for a combination of data that exists:

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3288467

--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1000| 24000| 575 (15)| 00:00:01 |

|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 1000| 24000| 575 (15)| 00:00:01 |

--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)

filter("CODE1"=42 AND "CODE2"=42)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

40964 consistent gets

40953 physical reads

0 redo size

1092240 bytes sent via SQL*Net to client

581 bytes received via SQL*Net from client

21 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

100000 rows processed

Without an index, the CBO has no choice but to use a FTS. However, the interesting thing to note is how the cardinality estimate is way wrong, with 100,000 rows returned but only 1000 rows estimated. The CBO incorrect assumes that 1/10000th of the data is being returned and not actual the 1/100 (1%).

If we run a query on a combination of data that doesn’t exist:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 3288467

--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1000 | 18000| 575 (15)| 00:00:01 |

|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 1000 | 18000| 575 (15)| 00:00:01 |

--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CODE1"=1 AND "CODE2"=42)

filter("CODE1"=1 AND "CODE2"=42)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

40964 consistent gets

40953 physical reads

0 redo size

368 bytes sent via SQL*Net to client

377 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

The CBO still estimates that 1000 rows are to be returned. However, with no rows returned, an index would be a much better alternative than the current FTS in this case.

Let’s now wait and see what the Automatic Indexing process makes of all this (following are highlights from the Auto Indexing Last Activity report):

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT

--------------------------------------------------------------------------------

GENERAL INFORMATION

-------------------------------------------------------------------------------

Activity start : 18-SEP-2020 01:24:17

Activity end : 18-SEP-2020 01:25:29

Executions completed : 1

Executions interrupted : 0

Executions with fatal error : 0

-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)

-------------------------------------------------------------------------------

Index candidates : 0

Indexes created (visible / invisible) : 1 (1 / 0)

Space used (visible / invisible) : 134.22 MB (134.22 MB / 0 B)

Indexes dropped : 0

SQL statements verified : 1

SQL statements improved (improvement factor) : 1 (41301.7x)

SQL plan baselines created : 0

Overall improvement factor : 41301.7x

-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)

-------------------------------------------------------------------------------

Unused indexes : 0

Space used : 0 B

Unusable indexes : 0

-------------------------------------------------------------------------------

INDEX DETAILS

-------------------------------------------------------------------------------

The following indexes were created:

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

So Oracle does indeed create an automatic index on the CODE1, CODE2 columns. However, notice that only 1 statement has been verified and not the above two statements that I had executed during the previous period.

VERIFICATION DETAILS

-------------------------------------------------------------------------------

The performance of the following statements improved:

-------------------------------------------------------------------------------

Parsing Schema Name : BOWIE

SQL ID : bdnf0barn3jk7

SQL Text : select code1, code2, name from iggy_pop where code1=1 and code2=42

Improvement Factor : 41301.7x

Execution Statistics:

-----------------------------

Original Plan Auto Index Plan

---------------------------- ----------------------------

Elapsed Time (s): 72085 1342

CPU Time (s): 39272 679

Buffer Gets: 123907 3

Optimizer Cost: 575 4

Disk Reads: 122859 2

Direct Writes: 0 0

Rows Processed: 0 0

Executions: 3 1

So only the SQL that returned 0 rows has been reported. As expected, it runs much more efficiently with an index than via the previous FTS, with an Improvement Factor of some 41301.7x.

PLANS SECTION

---------------------------------------------------------------------------------------------

- Original

-----------------------------

Plan Hash Value : 3288467

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | | | 575 | |

| 1 | TABLE ACCESS STORAGE FULL | IGGY_POP | 1000 | 18000 | 575 | 00:00:01 |

--------------------------------------------------------------------------------

Notes

-----

- dop = 1

- px_in_memory_imc = no

- px_in_memory = no

- With Auto Indexes

-----------------------------

Plan Hash Value : 2496796491

-------------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |

-------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 2 | 36 | 4 | 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED | IGGY_POP | 2 | 36 | 4 | 00:00:01 |

| * 2 | INDEX RANGE SCAN | SYS_AI_1awkddqkwa4f8 | 1 | | 3 | 00:00:01 |

-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

------------------------------------------

* 2 - access("CODE1"=1 AND "CODE2"=42)

Notes

-----

- Dynamic sampling used for this statement ( level = 11 )

If we look at the comparison between plans, the new plan of course uses the newly created Automatic Index.

The critical point to notice here however is that the cardinality estimates are almost spot for the new execution plan (2 rows is much closer to reality than the previous 1000).

The reason why it’s much more accurate is because the Auto Indexing process session uses the new Dynamic Sampling Level = 11. This enables the CBO to sample data on the fly and determine a much more accurate cardinality estimate than by default where the Dynamic Sampling Level=2.

This also explains why the other statement which returned many rows was not “verified”. Actually, it was but because the Auto Index process with Dynamic Sampling set to 11 correctly identified that too many rows were being returned to make any new index viable, this statement did NOT cause the new index to be kept.

So it was only the SQL that returned no rows that resulted in the newly created Automatic Index. The other statement was correctly determined by the Automatic Indexing process to run worse with the new index and so determined that the CBO would simply ignore the index if created.

BUT this assumption of the CBO ignoring the index is NOT correct as we’ll see…

If we look at the new Automatic Index:

SQL> select index_name, auto

, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name=

'IGGY_POP';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR

------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------

SYS_AI_1awkddqkwa4f8 YES NO VISIBLE ADVANCED LOW VALID 10000000 15362 4083700

We can see the index is both VISIBLE and VALID and so can potentially be used now by ANY subsequent SQL statement.

Now the important thing to note is that the default for most sessions in a database is for Dynamic Sampling to be set to 2 and for Optimizer_Adaptive_Statistics=False. Importantly, this is also the case in Oracle’s Autonomous Transaction Processing Cloud service.

SQL> show parameter sampling

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

optimizer_dynamic_sampling integer 2

SQL> show parameter optimizer_adaptive

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

optimizer_adaptive_plans boolean TRUE

optimizer_adaptive_reporting_only boolean FALSE

optimizer_adaptive_statistics boolean FALSE

So this is DIFFERENT to the settings for the Automatic Indexing process. In a standard session, the CBO will NOT have the capability to accurately determine the correct cardinality estimates as we saw previously.

If we now re-run the SQL that returns no rows:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 2496796491

------------------------------------------------------------------------------------------------------------

| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |

------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1000 | 18000 | 413 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP | 1000 | 18000 | 413 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | SYS_AI_1awkddqkwa4f8 | 1000 | | 4 (0)| 00:00:01 |

------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("CODE1"=1 AND "CODE2"=42)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

3 consistent gets

0 physical reads

0 redo size

368 bytes sent via SQL*Net to client

377 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

The execution uses the new index, because even though it STILL thinks 1000 rows are to be returned, that’s sufficiently few for the index to be costed the cheaper option.

When when re-run the SQL that returns many many rows:

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 2496796491

------------------------------------------------------------------------------------------------------------

| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |

------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1000 | 24000 | 413 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP | 1000 | 24000 | 413 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | SYS_AI_1awkddqkwa4f8 | 1000 | | 4 (0)| 00:00:01 |

------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("CODE1"=42 AND "CODE2"=42)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

25 recursive calls

0 db block gets

41981 consistent gets

40953 physical reads

0 redo size

1092240 bytes sent via SQL*Net to client

581 bytes received via SQL*Net from client

21 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

100000 rows processed

Ouch. It also uses the new Automatic Index, because it also STILL thinks only 1000 rows are to be returned and just like the previous SQL statement, is determined to be the cheaper option.

BUT in this case it isn’t really the cheaper option, having to read the table potentially piecemeal at a time via the index, rather than more efficiently with fewer and larger multiblock reads via a FTS.

This is not really how Automatic is designed to work. Its meant to protect us from making SQL statements regress in performance BUT because there is a difference in how a normal session and the Automatic Indexing process determines the cost of execution plans, these scenarios can eventuate.

In my next blog I’ll look at how to address this specific scenario and then look at an example of how Automatic Indexing is really meant to work via the use of automated baselines…

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighborhood Threat) September 18, 2020

Leave a Reply Cancel reply

Recommend

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Slee...

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Pretties...

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And V...

Oracle 19c Automatic Indexing: Indexing Partitioned Tables Part I (Conversation...

Announcement: New Oracle Webinars Scheduled For February 2021 !!

Maven BOM

Maven Archetypes

B-Tree index structure

Java 13

Software Polyglots

About Joyk