1

Automatic Indexing 21c: Non-Equality Predicate Anomaly (“Strangers When We Meet”...

 2 years ago
source link: https://richardfoote.wordpress.com/2022/07/14/automatic-indexing-21c-non-equality-predicate-anomaly-strangers-when-we-meet/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Automatic Indexing 21c: Non-Equality Predicate Anomaly (“Strangers When We Meet”) July 14, 2022

Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, Index Column Order, Invisible Indexes, Non-Equality Predicates, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning, Richard Foote Training, Richard's Blog.
trackback

strangers-when-we-meet-single.jpg?w=300&h=261

I’m currently putting together some Exadata related training for a couple of customers and came across a rather strange anomaly with regard the status of Automatic Indexes, when created in part on unselective, non-equality predicates.

As discussed previously, Oracle Database 21c now allows the creation of Automatic Indexes based on non-equality predicates (previously, Automatic Indexes were only created on equality-based predicates).

But one appears to get rather odd resultant Automatic Indexes in the scenario where the non-equality predicate is not particularly selective but other predicates are highly selective.

To illustrate, I’ll create a basic table that has two columns (ID and CODE) that are both highly selective:

SQL> create table ziggy_new (id number, code number, name varchar2(42));
Table created.
SQL> insert into ziggy_new select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;
10000000 rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_NEW');
PL/SQL procedure successfully completed.

So there are currently no indexes on this table.

I’ll next run the following SQL (and others similar) a number of times:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;
Execution Plan
----------------------------------------------------------
Plan hash value: 3165184525
----------------------------------------------------------------------------------------
| Id  | Operation                  | Name      | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |    1 |    23 |    6738 (2) | 00:00:01 |
| * 1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 |    6738 (2) | 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1)
filter("CODE"=42 AND "ID"<=100000 AND "ID">=1)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
38605 consistent gets
38600 physical reads
0 redo size
725 bytes sent via SQL*Net to client
52 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Without any indexes, the CBO currently has no choice but to use a Full Table Scan.

But only 1 row is returned. The first equality predicate on the CODE column is highly selective and on its own would only return 10 rows out of the 10M row table. The second, non-equality range-based predicate on the ID column is nowhere near as selective and offers limited additional filtering.

The CBO stops calculating index related costs after a non-equality predicate column (as subsequent column values could exist anywhere within the preceding range), and so the more effective index here is one based on (CODE, ID) with the non-equality predicate column second,  or potentially just on the CODE column only, as the ID range offers minimal filtering benefits.

So what does Automatic Indexing make of things?

If we look at the subsequent Automatic Indexing report:

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 3
Indexes created (visible / invisible)        : 1 (0 / 1)
Space used (visible / invisible)             : 209.72 MB (0 B / 209.72 MB)
Indexes dropped                              : 0
SQL statements verified                      : 44
SQL statements improved (improvement factor) : 12 (64.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 1.6x
-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------
INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
----------------------------------------------------------------------------
| Owner | Table     | Index                | Key     | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | ZIGGY_NEW | SYS_AI_75j16xff1ag3j | CODE,ID | B-TREE | NONE       |
----------------------------------------------------------------------------

So Automatic Indexing has indeed created an index based on CODE,ID (a common Automatic Indexing trait appears to be to create an index based on all available predicates).

BUT the index is created as an INVISIBLE Index and so can not generally be used by database sessions.

SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='ZIGGY_NEW';
INDEX_NAME                     AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- -------- ---------- ----------- -----------------
SYS_AI_75j16xff1ag3j           YES INVISIBLE VALID      10000000       25123          10000000
SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='ZIGGY_NEW';
INDEX_NAME                     COLUMN_NAME  COLUMN_POSITION
------------------------------ ------------ ---------------
SYS_AI_75j16xff1ag3j           CODE                       1
SYS_AI_75j16xff1ag3j           ID                         2

So re-running the previous SQL statements continues to use a Full Table Scan:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;
Execution Plan
----------------------------------------------------------
Plan hash value: 3165184525
----------------------------------------------------------------------------------------
|  Id | Operation                  | Name      | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |    1 |    23 |    6738 (2) | 00:00:01 |
| * 1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 |    6738 (2) | 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1)
filter("CODE"=42 AND "ID"<=100000 AND "ID">=1)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
38605 consistent gets
38600 physical reads
0 redo size
725 bytes sent via SQL*Net to client
52 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Automatic Indexing appears to only create Invisible indexes when there is an inefficient non-equality predicate present. It won’t create the index as a Visible index, even though it would significantly benefit these SQL statements that caused its creation. And Automatic Indexing won’t create an index on just the highly selective CODE equality predicate, which would also be of much benefit to these SQL statements.

If we now run similar queries, but with much more selective non-equality predicates, such as:

SQL> select * from ziggy_new where code=1 and id between 1 and 10;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 3165184525
----------------------------------------------------------------------------------------
|  Id | Operation                  | Name      | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |    1 |    23 |    6738 (2) | 00:00:01 |
| * 1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 |    6738 (2) | 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("CODE"=1 AND "ID"<=10 AND "ID">=1)
filter("CODE"=1 AND "ID"<=10 AND "ID">=1)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
38604 consistent gets
38600 physical reads
0 redo size
503 bytes sent via SQL*Net to client
41 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed

Again, with no (Visible) index present, the CBO currently has no choice but to use the Full Table Scan.

But during the next cycle, after Automatic Indexing kicks in again:

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 5
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 209.72 MB (209.72 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 89
SQL statements improved (improvement factor) : 31 (71.9x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 1.7x
-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------
INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
----------------------------------------------------------------------------
| Owner | Table     | Index                | Key     | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | ZIGGY_NEW | SYS_AI_75j16xff1ag3j | CODE,ID | B-TREE | NONE       |
----------------------------------------------------------------------------
-------------------------------------------------------------------------------
VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : d4znwcu4h52ca
SQL Text            : select * from ziggy_new where code=42 and id between 1 and 10
Improvement Factor  : 38604x
Execution Statistics:
-----------------------------
Original Plan                Auto Index Plan
---------------------------- ----------------------------
Elapsed Time (s):   3398605                      68
CPU Time (s):       3166824                      68
Buffer Gets:        463250                       3
Optimizer Cost:     6738                         4
Disk Reads:         463200                       0
Direct Writes:      0                            0
Rows Processed:     0                            0
Executions:         12                           1
PLANS SECTION
--------------------------------------------------------------------------------
-------------
- Original
-----------------------------
Plan Hash Value : 3165184525
--------------------------------------------------------------------------------
| Id | Operation                  | Name      | Rows | Bytes | Cost | Time     |
--------------------------------------------------------------------------------
0 | SELECT STATEMENT           |           |      |       | 6738 |          |
1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 | 6738 | 00:00:01 |
--------------------------------------------------------------------------------
- With Auto Indexes
-----------------------------
Plan Hash Value : 1514586396
-------------------------------------------------------------------------------------------------------
|  Id | Operation                            | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                      |    1 |    23 |    4 | 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW            |    1 |    23 |    4 | 00:00:01 |
| * 2 |   INDEX RANGE SCAN                   | SYS_AI_75j16xff1ag3j |    1 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10)
Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

But this time, the index on the CODE,ID columns is created as a Visible index.

INDEX_NAME                     AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- -------- ---------- ----------- -----------------
SYS_AI_75j16xff1ag3j           YES VISIBLE   VALID      10000000       25123          10000000
SQL> select index_name, column_name, column_position from user_ind_columns where table_name='ZIGGY_NEW';
INDEX_NAME                     COLUMN_NAME  COLUMN_POSITION
------------------------------ ------------ ---------------
SYS_AI_75j16xff1ag3j           CODE                       1
SYS_AI_75j16xff1ag3j           ID                         2

So this index can be generally used, both by the newer SQLs that generated the now Visible index:

SQL> select * from ziggy_new where code=42 and id between 1 and 10;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1514586396
------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT                     |                      |    1 |    23 |       4 (0) | 00:00:01 |
1 |  TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW            |    1 |    23 |       4 (0) | 00:00:01 |
|* 2 |   INDEX RANGE SCAN                   | SYS_AI_75j16xff1ag3j |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
503 bytes sent via SQL*Net to client
41 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed

And also used by the SQLs with the unselective non-equality predicates, that Automatic Indexing would only create as Invisible indexes:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;
Execution Plan
----------------------------------------------------------
Plan hash value: 1514586396
------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT                     |                      |    1 |    23 |       4 (0) | 00:00:01 |
1 |  TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW            |    1 |    23 |       4 (0) | 00:00:01 |
|* 2 |   INDEX RANGE SCAN                   | SYS_AI_75j16xff1ag3j |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CODE"=42 AND "ID">=1 AND "ID"<=100000)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
729 bytes sent via SQL*Net to client
52 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Automatic Indexing appears to currently not quite do the right thing with SQL statements that have unselective non-equality predicates, by creating such indexes as only Invisible Indexes, inclusive of the unselective columns.

Although an edge case, I would recommend looking through the list of created Automatic Indexes to see if any such Invisible/Valid indexes exists, as it can suggest there are current efficient SQL statements that could benefit from such indexes being Visible.

Comments»

669c0dccb6eeb001512d7390b5099171?s=32&d=identicon&r=G 1. Baulkers - July 14, 2022

Interesting. Are automated indexes still just an exadata thing? Will it come to the non-engineered systems anytime soon?

d665f0bedd1b771fb94defcaae359a7e?s=32&d=identicon&r=GRichard Foote - July 14, 2022

Hi Baulkers

Unfortunately, it’s an Exadata only feature (and Autonomous DBs that of course also use Exadata).

Leave a Reply Cancel reply


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK