Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighb...
source link: https://richardfoote.wordpress.com/2020/09/18/oracle-19c-automatic-indexing-cbo-incorrectly-using-auto-indexes-part-i-neighborhood-threat/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighborhood Threat) September 18, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Explain Plan For Index, Extended Statistics, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle General, Oracle Indexes.
trackback
Following on from my previous few posts on “data skew”, I’m now going to look at it from a slightly different perspective, where there is an inherent relationship between columns. The CBO has difficulties in recognising (by default) that some combinations of column values are far more common than other combinations, resulting in incorrect cardinality estimates and resultant poor execution plans.
As we’ll see, this skew in returned data can lead to poor execution plans due to the inappropriate use of newly created Automatic Indexes…
I’ll start by creating a simple table that has two columns of interest, CODE1 and CODE2:
SQL> create table iggy_pop (id number, code
1
number, code
2
number, name varchar
2
(
42
));
Table created.
SQL> insert into iggy_pop select rownum, mod(rownum,
100
)+
1
, mod(rownum,
100
)+
1
,
'David Bowie'
from dual connect by
level
<=
10000000
;
10000000
rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>
'IGGY_POP'
);
PL/SQL procedure successfully completed.
Both columns CODE1 and CODE2 each have 100 distinct values, so that the possible combinations of data from both columns is 100 x 100 = 10,000. HOWEVER, the values of CODE1 and CODE2 are always the same and so there is in fact only 100 distinct combinations of data because of this inherent relationship between columns.
If we run the following query for a combination of data that exists:
SQL> select * from iggy_pop where code
1
=
42
and code
2
=
42
;
100000
rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value:
3288467
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1000
|
24000
|
575
(
15
)|
00:
00:
01
|
|*
1
| TABLE ACCESS STORAGE FULL| IGGY_POP |
1000
|
24000
|
575
(
15
)|
00:
00:
01
|
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- storage(
"CODE1"
=
42
AND
"CODE2"
=
42
)
filter(
"CODE1"
=
42
AND
"CODE2"
=
42
)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
40964
consistent gets
40953
physical reads
0
redo size
1092240
bytes sent via SQL*Net to client
581
bytes received via SQL*Net from client
21
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
100000
rows processed
Without an index, the CBO has no choice but to use a FTS. However, the interesting thing to note is how the cardinality estimate is way wrong, with 100,000 rows returned but only 1000 rows estimated. The CBO incorrect assumes that 1/10000th of the data is being returned and not actual the 1/100 (1%).
If we run a query on a combination of data that doesn’t exist:
SQL> select code
1
, code
2
, name from iggy_pop where code
1
=
1
and code
2
=
42
;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value:
3288467
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1000
|
18000
|
575
(
15
)|
00:
00:
01
|
|*
1
| TABLE ACCESS STORAGE FULL| IGGY_POP |
1000
|
18000
|
575
(
15
)|
00:
00:
01
|
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- storage(
"CODE1"
=
1
AND
"CODE2"
=
42
)
filter(
"CODE1"
=
1
AND
"CODE2"
=
42
)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
40964
consistent gets
40953
physical reads
0
redo size
368
bytes sent via SQL*Net to client
377
bytes received via SQL*Net from client
1
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
0
rows processed
The CBO still estimates that 1000 rows are to be returned. However, with no rows returned, an index would be a much better alternative than the current FTS in this case.
Let’s now wait and see what the Automatic Indexing process makes of all this (following are highlights from the Auto Indexing Last Activity report):
SQL> select dbms_auto_index.report_last_activity() report from dual;
REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start :
18
-SEP
-2020
01:
24:
17
Activity end :
18
-SEP
-2020
01:
25:
29
Executions completed :
1
Executions interrupted :
0
Executions with fatal error :
0
-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates :
0
Indexes created (
visible
/ invisible) :
1
(
1
/
0
)
Space used (
visible
/ invisible) :
134.22
MB (
134.22
MB /
0
B)
Indexes dropped :
0
SQL statements verified :
1
SQL statements improved (improvement factor) :
1
(
41301.7
x)
SQL plan baselines created :
0
Overall improvement factor :
41301.7
x
-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes :
0
Space used :
0
B
Unusable indexes :
0
-------------------------------------------------------------------------------
INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Owner | Table | Index | Key | Type | Properties |
-------------------------------------------------------------------------------
| BOWIE | IGGY_POP | SYS_AI_
1
awkddqkwa
4
f
8
| CODE
1
,CODE
2
| B-TREE | NONE |
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
So Oracle does indeed create an automatic index on the CODE1, CODE2 columns. However, notice that only 1 statement has been verified and not the above two statements that I had executed during the previous period.
VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID : bdnf
0
barn
3
jk
7
SQL Text : select code
1
, code
2
, name from iggy_pop where code
1
=
1
and code
2
=
42
Improvement Factor :
41301.7
x
Execution Statistics:
-----------------------------
Original Plan Auto Index Plan
---------------------------- ----------------------------
Elapsed Time (s):
72085
1342
CPU Time (s):
39272
679
Buffer Gets:
123907
3
Optimizer Cost:
575
4
Disk Reads:
122859
2
Direct Writes:
0
0
Rows Processed:
0
0
Executions:
3
1
So only the SQL that returned 0 rows has been reported. As expected, it runs much more efficiently with an index than via the previous FTS, with an Improvement Factor of some 41301.7x.
PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value :
3288467
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | | | |
575
| |
|
1
| TABLE ACCESS STORAGE FULL | IGGY_POP |
1000
|
18000
|
575
|
00:
00:
01
|
--------------------------------------------------------------------------------
Notes
-----
- dop =
1
- px_in_memory_imc = no
- px_in_memory = no
- With Auto Indexes
-----------------------------
Plan Hash Value :
2496796491
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
2
|
36
|
4
|
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED | IGGY_POP |
2
|
36
|
4
|
00:
00:
01
|
| *
2
| INDEX RANGE SCAN | SYS_AI_
1
awkddqkwa
4
f
8
|
1
| |
3
|
00:
00:
01
|
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
*
2
- access(
"CODE1"
=
1
AND
"CODE2"
=
42
)
Notes
-----
- Dynamic sampling used for this statement (
level
=
11
)
If we look at the comparison between plans, the new plan of course uses the newly created Automatic Index.
The critical point to notice here however is that the cardinality estimates are almost spot for the new execution plan (2 rows is much closer to reality than the previous 1000).
The reason why it’s much more accurate is because the Auto Indexing process session uses the new Dynamic Sampling Level = 11. This enables the CBO to sample data on the fly and determine a much more accurate cardinality estimate than by default where the Dynamic Sampling Level=2.
This also explains why the other statement which returned many rows was not “verified”. Actually, it was but because the Auto Index process with Dynamic Sampling set to 11 correctly identified that too many rows were being returned to make any new index viable, this statement did NOT cause the new index to be kept.
So it was only the SQL that returned no rows that resulted in the newly created Automatic Index. The other statement was correctly determined by the Automatic Indexing process to run worse with the new index and so determined that the CBO would simply ignore the index if created.
BUT this assumption of the CBO ignoring the index is NOT correct as we’ll see…
If we look at the new Automatic Index:
SQL> select index_name,
auto
, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name=
'IGGY_POP'
;
INDEX_NAME AUT CON VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_
1
awkddqkwa
4
f
8
YES NO VISIBLE ADVANCED LOW VALID
10000000
15362
4083700
We can see the index is both VISIBLE and VALID and so can potentially be used now by ANY subsequent SQL statement.
Now the important thing to note is that the default for most sessions in a database is for Dynamic Sampling to be set to 2 and for Optimizer_Adaptive_Statistics=False. Importantly, this is also the case in Oracle’s Autonomous Transaction Processing Cloud service.
SQL>
show
parameter sampling
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling integer
2
SQL>
show
parameter optimizer_adaptive
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_adaptive_plans boolean TRUE
optimizer_adaptive_reporting_only boolean FALSE
optimizer_adaptive_statistics boolean FALSE
So this is DIFFERENT to the settings for the Automatic Indexing process. In a standard session, the CBO will NOT have the capability to accurately determine the correct cardinality estimates as we saw previously.
If we now re-run the SQL that returns no rows:
SQL> select code
1
, code
2
, name from iggy_pop where code
1
=
1
and code
2
=
42
;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value:
2496796491
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1000
|
18000
|
413
(
0
)|
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP |
1000
|
18000
|
413
(
0
)|
00:
00:
01
|
|*
2
| INDEX RANGE SCAN | SYS_AI_
1
awkddqkwa
4
f
8
|
1000
| |
4
(
0
)|
00:
00:
01
|
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2
- access(
"CODE1"
=
1
AND
"CODE2"
=
42
)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
3
consistent gets
0
physical reads
0
redo size
368
bytes sent via SQL*Net to client
377
bytes received via SQL*Net from client
1
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
0
rows processed
The execution uses the new index, because even though it STILL thinks 1000 rows are to be returned, that’s sufficiently few for the index to be costed the cheaper option.
When when re-run the SQL that returns many many rows:
SQL> select * from iggy_pop where code
1
=
42
and code
2
=
42
;
100000
rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value:
2496796491
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1000
|
24000
|
413
(
0
)|
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP |
1000
|
24000
|
413
(
0
)|
00:
00:
01
|
|*
2
| INDEX RANGE SCAN | SYS_AI_
1
awkddqkwa
4
f
8
|
1000
| |
4
(
0
)|
00:
00:
01
|
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2
- access(
"CODE1"
=
42
AND
"CODE2"
=
42
)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
Statistics
----------------------------------------------------------
25
recursive calls
0
db
block
gets
41981
consistent gets
40953
physical reads
0
redo size
1092240
bytes sent via SQL*Net to client
581
bytes received via SQL*Net from client
21
SQL*Net roundtrips to/from client
1
sorts (memory)
0
sorts (disk)
100000
rows processed
Ouch. It also uses the new Automatic Index, because it also STILL thinks only 1000 rows are to be returned and just like the previous SQL statement, is determined to be the cheaper option.
BUT in this case it isn’t really the cheaper option, having to read the table potentially piecemeal at a time via the index, rather than more efficiently with fewer and larger multiblock reads via a FTS.
This is not really how Automatic is designed to work. Its meant to protect us from making SQL statements regress in performance BUT because there is a difference in how a normal session and the Automatic Indexing process determines the cost of execution plans, these scenarios can eventuate.
In my next blog I’ll look at how to address this specific scenario and then look at an example of how Automatic Indexing is really meant to work via the use of automated baselines…
Leave a Reply Cancel reply
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK