Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Slee...

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Dynamic Sampling, Exadata, Explain Plan For Index, Extended Statistics, Hints, Histograms, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.
trackback

As I discussed in Part I of this series, problems and inconsistencies can appear between what the Automatic Indexing processing thinks will happen with newly created Automatic Indexing and what actually happens in other database sessions. This is because the Automatic Indexing process session uses a much higher degree of Dynamic Sampling (Level=11) than other database sessions use by default (Level=2).

As we saw in Part I, an SQL statement may be deemed to NOT use an index in the Automatic Indexing deliberations, where it is actually used in normal database sessions (and perhaps incorrectly so). Where the data is heavily skewed and current statistics are insufficient for the CBO to accurately detect such “skewness” is one such scenario where we might encounter this issue.

One option to get around this is to hint any such queries with a Dynamic Sampling value that matches that of the Automatic Indexing process (or sufficient to determine more accurate cardinality estimates).

If we re-run the problematic query from Part I (where a new Automatic Index was inappropriately used by the CBO) with such a Dynamic Sampling hint:

SQL> select /*+ dynamic_sampling(11) */ * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3288467

--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 100K| 2343K| 575 (15)| 00:00:01 |

|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 101K| 2388K| 575 (15)| 00:00:01 |

--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)

filter("CODE1"=42 AND "CODE2"=42)

Note

-----

- dynamic statistics used: dynamic sampling (level=AUTO)

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

40964 consistent gets

40953 physical reads

0 redo size

1092240 bytes sent via SQL*Net to client

609 bytes received via SQL*Net from client

21 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

100000 rows processed

We can see that the CBO this time correctly calculated the cardinality and hence correctly decided against the use of the Automatic Index.

Although these parameters can’t be changed in the Oracle Autonomous Database Cloud services, on the Exadata platform if using Automatic Indexing you might want to consider setting the OPTIMIZER_DYNAMIC_SAMPLING parameter to 11 (and/or OPTIMIZER_ADAPTIVE_STATISTICS=true) in order to be consistent with the Automatic Indexing process. These settings can obviously add significant overhead during parsing and so need to be set with caution.

In this scenario where there is an inherent relationship between columns which the CBO is not detecting, the creation of Extended Statistics can be beneficial.

We currently have the following columns and statistics on the IGGY_POP table:

SQL> select column_name, num_distinct, density, num_buckets, histogram

from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM

-------------------- ------------ ---------- ----------- ---------------

ID 9705425 0 254 HYBRID

CODE1 100 .00000005 100 FREQUENCY

CODE2 100 .00000005 100 FREQUENCY

NAME 1 5.0210E-08 1 FREQUENCY

If we now collect Extended Statistics on both CODE1, CODE2 columns:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP', method_opt=> 'FOR COLUMNS (CODE1,CODE2) SIZE 254');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM

------------------------------ ------------ ---------- ----------- ---------------

ID 9705425 0 254 HYBRID

CODE1 100 .00000005 100 FREQUENCY

CODE2 100 .00000005 100 FREQUENCY

NAME 1 5.0210E-08 1 FREQUENCY

SYS_STU#29QF8Y9BUDOW2HCDL47N44 99 .00000005 100 FREQUENCY

The CBO now has some idea on the cardinality if both columns are used within a predicate.

If we re-run the problematic query without the hint:

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3288467

--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 100K| 2343K| 575 (15)| 00:00:01 |

|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 100K| 2343K| 575 (15)| 00:00:01 |

--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)

filter("CODE1"=42 AND "CODE2"=42)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

40964 consistent gets

40953 physical reads

0 redo size

1092240 bytes sent via SQL*Net to client

581 bytes received via SQL*Net from client

21 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

100000 rows processed

Again, the CBO is correctly the cardinality estimate of 100K rows and so is NOT using the Automatic Index.

However, we can still get ourselves in problems. If I now re-run the query that returns no rows and was previously correctly using the Automatic Index:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 3288467

--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 50000 | 878K | 575 (15) | 00:00:01 |

|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 50000 | 878K | 575 (15) | 00:00:01 |

--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CODE1"=1 AND "CODE2"=42)

filter("CODE1"=1 AND "CODE2"=42)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

40964 consistent gets

40953 physical reads

0 redo size

368 bytes sent via SQL*Net to client

377 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

We see that the CBO is now getting this execution plan wrong and is now estimating incorrectly that 50,000 rows are to be returned (and not the 1000 rows it estimated previously). This increased estimate is now deemed too expensive for the Automatic Index to retrieve and is now incorrectly using a FTS.

This because with a Frequency based histogram now in place, Oracle assumes that 50% of the lowest recorded frequency within the histogram is returned (100,000 x 0.5 = 50,000) if the values don’t exist but resided within the known min-max range of values.

So we need to be very careful HOW we potentially collect any additional statistics and its potential impact on other SQL statements.

As I’ll discuss next, another alternative to get more consistent behavior with Automatic Indexing in these types of scenarios is to make the Automatic Indexing processing session appear more like other database sessions…

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020

Recommend

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Pretties...

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And V...

Oracle 19c Automatic Indexing: Indexing Partitioned Tables Part I (Conversation...

Announcement: New Oracle Webinars Scheduled For February 2021 !!

Maven BOM

Maven Archetypes

B-Tree index structure

Java 13

Software Polyglots

Laravel 性能优化：优化 ORM 性能使应用程序高可用

About Joyk