Automatic Indexing 21c: Non-Equality Predicate Anomaly (“Strangers When We Meet”...

Automatic Indexing 21c: Non-Equality Predicate Anomaly (“Strangers When We Meet”) July 14, 2022

Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, Index Column Order, Invisible Indexes, Non-Equality Predicates, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning, Richard Foote Training, Richard's Blog.
trackback

strangers-when-we-meet-single.jpg?w=300&h=261

I’m currently putting together some Exadata related training for a couple of customers and came across a rather strange anomaly with regard the status of Automatic Indexes, when created in part on unselective, non-equality predicates.

As discussed previously, Oracle Database 21c now allows the creation of Automatic Indexes based on non-equality predicates (previously, Automatic Indexes were only created on equality-based predicates).

But one appears to get rather odd resultant Automatic Indexes in the scenario where the non-equality predicate is not particularly selective but other predicates are highly selective.

To illustrate, I’ll create a basic table that has two columns (ID and CODE) that are both highly selective:

SQL> create table ziggy_new (id number, code number, name varchar2(42));

Table created.

SQL> insert into ziggy_new select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_NEW');

PL/SQL procedure successfully completed.

So there are currently no indexes on this table.

I’ll next run the following SQL (and others similar) a number of times:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;

Execution Plan

----------------------------------------------------------

Plan hash value: 3165184525

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 23 | 6738 (2) | 00:00:01 |

| * 1 | TABLE ACCESS STORAGE FULL | ZIGGY_NEW | 1 | 23 | 6738 (2) | 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1)

filter("CODE"=42 AND "ID"<=100000 AND "ID">=1)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

38605 consistent gets

38600 physical reads

0 redo size

725 bytes sent via SQL*Net to client

52 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

Without any indexes, the CBO currently has no choice but to use a Full Table Scan.

But only 1 row is returned. The first equality predicate on the CODE column is highly selective and on its own would only return 10 rows out of the 10M row table. The second, non-equality range-based predicate on the ID column is nowhere near as selective and offers limited additional filtering.

The CBO stops calculating index related costs after a non-equality predicate column (as subsequent column values could exist anywhere within the preceding range), and so the more effective index here is one based on (CODE, ID) with the non-equality predicate column second, or potentially just on the CODE column only, as the ID range offers minimal filtering benefits.

So what does Automatic Indexing make of things?

If we look at the subsequent Automatic Indexing report:

SUMMARY (AUTO INDEXES)

-------------------------------------------------------------------------------

Index candidates : 3

Indexes created (visible / invisible) : 1 (0 / 1)

Space used (visible / invisible) : 209.72 MB (0 B / 209.72 MB)

Indexes dropped : 0

SQL statements verified : 44

SQL statements improved (improvement factor) : 12 (64.7x)

SQL plan baselines created : 0

Overall improvement factor : 1.6x

-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)

-------------------------------------------------------------------------------

Unused indexes : 0

Space used : 0 B

Unusable indexes : 0

-------------------------------------------------------------------------------

INDEX DETAILS

-------------------------------------------------------------------------------

The following indexes were created:

-------------------------------------------------------------------------------

----------------------------------------------------------------------------

----------------------------------------------------------------------------

----------------------------------------------------------------------------

So Automatic Indexing has indeed created an index based on CODE,ID (a common Automatic Indexing trait appears to be to create an index based on all available predicates).

BUT the index is created as an INVISIBLE Index and so can not generally be used by database sessions.

SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor

from user_indexes where table_name='ZIGGY_NEW';

INDEX_NAME AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR

------------------------------ --- --------- -------- ---------- ----------- -----------------

SYS_AI_75j16xff1ag3j YES INVISIBLE VALID 10000000 25123 10000000

SQL> select index_name, column_name, column_position

from user_ind_columns where table_name='ZIGGY_NEW';

INDEX_NAME COLUMN_NAME COLUMN_POSITION

------------------------------ ------------ ---------------

SYS_AI_75j16xff1ag3j CODE 1

SYS_AI_75j16xff1ag3j ID 2

So re-running the previous SQL statements continues to use a Full Table Scan:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;

Execution Plan

----------------------------------------------------------

Plan hash value: 3165184525

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 23 | 6738 (2) | 00:00:01 |

| * 1 | TABLE ACCESS STORAGE FULL | ZIGGY_NEW | 1 | 23 | 6738 (2) | 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1)

filter("CODE"=42 AND "ID"<=100000 AND "ID">=1)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

38605 consistent gets

38600 physical reads

0 redo size

725 bytes sent via SQL*Net to client

52 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

Automatic Indexing appears to only create Invisible indexes when there is an inefficient non-equality predicate present. It won’t create the index as a Visible index, even though it would significantly benefit these SQL statements that caused its creation. And Automatic Indexing won’t create an index on just the highly selective CODE equality predicate, which would also be of much benefit to these SQL statements.

If we now run similar queries, but with much more selective non-equality predicates, such as:

SQL> select * from ziggy_new where code=1 and id between 1 and 10;

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 3165184525

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 23 | 6738 (2) | 00:00:01 |

| * 1 | TABLE ACCESS STORAGE FULL | ZIGGY_NEW | 1 | 23 | 6738 (2) | 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CODE"=1 AND "ID"<=10 AND "ID">=1)

filter("CODE"=1 AND "ID"<=10 AND "ID">=1)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

38604 consistent gets

38600 physical reads

0 redo size

503 bytes sent via SQL*Net to client

41 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

Again, with no (Visible) index present, the CBO currently has no choice but to use the Full Table Scan.

But during the next cycle, after Automatic Indexing kicks in again:

SUMMARY (AUTO INDEXES)

-------------------------------------------------------------------------------

Index candidates : 5

Indexes created (visible / invisible) : 1 (1 / 0)

Space used (visible / invisible) : 209.72 MB (209.72 MB / 0 B)

Indexes dropped : 0

SQL statements verified : 89

SQL statements improved (improvement factor) : 31 (71.9x)

SQL plan baselines created : 0

Overall improvement factor : 1.7x

-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)

-------------------------------------------------------------------------------

Unused indexes : 0

Space used : 0 B

Unusable indexes : 0

-------------------------------------------------------------------------------

INDEX DETAILS

-------------------------------------------------------------------------------

The following indexes were created:

-------------------------------------------------------------------------------

----------------------------------------------------------------------------

----------------------------------------------------------------------------

----------------------------------------------------------------------------

-------------------------------------------------------------------------------

VERIFICATION DETAILS

-------------------------------------------------------------------------------

The performance of the following statements improved:

-------------------------------------------------------------------------------

Parsing Schema Name : BOWIE

SQL ID : d4znwcu4h52ca

SQL Text : select * from ziggy_new where code=42 and id between 1 and 10

Improvement Factor : 38604x

Execution Statistics:

-----------------------------

Original Plan Auto Index Plan

---------------------------- ----------------------------

Elapsed Time (s): 3398605 68

CPU Time (s): 3166824 68

Buffer Gets: 463250 3

Optimizer Cost: 6738 4

Disk Reads: 463200 0

Direct Writes: 0 0

Rows Processed: 0 0

Executions: 12 1

PLANS SECTION

--------------------------------------------------------------------------------

-------------

- Original

-----------------------------

Plan Hash Value : 3165184525

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | | | 6738 | |

| 1 | TABLE ACCESS STORAGE FULL | ZIGGY_NEW | 1 | 23 | 6738 | 00:00:01 |

--------------------------------------------------------------------------------

- With Auto Indexes

-----------------------------

Plan Hash Value : 1514586396

-------------------------------------------------------------------------------------------------------

|  Id | Operation                            | Name                 | Rows | Bytes | Cost | Time     |

-------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 23 | 4 | 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW | 1 | 23 | 4 | 00:00:01 |

| * 2 | INDEX RANGE SCAN | SYS_AI_75j16xff1ag3j | 1 | | 3 | 00:00:01 |

-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

------------------------------------------

* 2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10)

Notes

-----

- Dynamic sampling used for this statement ( level = 11 )

But this time, the index on the CODE,ID columns is created as a Visible index.

INDEX_NAME AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR

------------------------------ --- --------- -------- ---------- ----------- -----------------

SYS_AI_75j16xff1ag3j YES VISIBLE VALID 10000000 25123 10000000

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='ZIGGY_NEW';

INDEX_NAME COLUMN_NAME COLUMN_POSITION

------------------------------ ------------ ---------------

SYS_AI_75j16xff1ag3j CODE 1

SYS_AI_75j16xff1ag3j ID 2

So this index can be generally used, both by the newer SQLs that generated the now Visible index:

SQL> select * from ziggy_new where code=42 and id between 1 and 10;

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 1514586396

------------------------------------------------------------------------------------------------------------

| Id | Operation                            | Name                 | Rows | Bytes | Cost (%CPU) | Time     |

------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 23 | 4 (0) | 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW | 1 | 23 | 4 (0) | 00:00:01 |

|* 2 | INDEX RANGE SCAN | SYS_AI_75j16xff1ag3j | 1 | | 3 (0) | 00:00:01 |

------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

3 consistent gets

0 physical reads

0 redo size

503 bytes sent via SQL*Net to client

41 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

And also used by the SQLs with the unselective non-equality predicates, that Automatic Indexing would only create as Invisible indexes:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;

Execution Plan

----------------------------------------------------------

Plan hash value: 1514586396

------------------------------------------------------------------------------------------------------------

| Id | Operation                            | Name                 | Rows | Bytes | Cost (%CPU) | Time     |

------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 23 | 4 (0) | 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW | 1 | 23 | 4 (0) | 00:00:01 |

|* 2 | INDEX RANGE SCAN | SYS_AI_75j16xff1ag3j | 1 | | 3 (0) | 00:00:01 |

------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("CODE"=42 AND "ID">=1 AND "ID"<=100000)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

5 consistent gets

0 physical reads

0 redo size

729 bytes sent via SQL*Net to client

52 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

Automatic Indexing appears to currently not quite do the right thing with SQL statements that have unselective non-equality predicates, by creating such indexes as only Invisible Indexes, inclusive of the unselective columns.

Although an edge case, I would recommend looking through the list of created Automatic Indexes to see if any such Invisible/Valid indexes exists, as it can suggest there are current efficient SQL statements that could benefit from such indexes being Visible.

Comments»

1. Baulkers - July 14, 2022

Interesting. Are automated indexes still just an exadata thing? Will it come to the non-engineered systems anytime soon?

Richard Foote - July 14, 2022

Hi Baulkers

Unfortunately, it’s an Exadata only feature (and Autonomous DBs that of course also use Exadata).

Automatic Indexing 21c: Non-Equality Predicate Anomaly (“Strangers When We Meet”) July 14, 2022

Comments»

Leave a Reply Cancel reply

Recommend

Pathfinder: Wrath of the Righteous Coming to Consoles in September

Apple starts trial production of iPhone 14, plans to mass-produce in August

Looking For Dp + Matrix Exponentiation Tutorials

ElasticSearch基本使用总结

PoGO-Net：使用图神经网络进行姿势图优化（ICCV 2021）

ELK设置密码

配置Clion中的代码风格

2022-07-15-青島-遼甯直播視頻/錄像/回放|黑白直播

LVI：激光雷达子系统的特征提取梳理

弃购推特：马斯克「开不起」440亿美元玩笑，真后悔报价太贵

About Joyk