Oracle 19c Automatic Indexing: Data Skew Part I (A Saucerful of Secrets)
source link: https://richardfoote.wordpress.com/2020/09/10/oracle-19c-automatic-indexing-data-skew-part-i-a-saucerful-of-secrets/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Oracle 19c Automatic Indexing: Data Skew Part I (A Saucerful of Secrets) September 10, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Data Skew, Full Table Scans, Histograms, Index Access Path, Index statistics, Low Cardinality, Oracle Blog, Oracle Indexes, Oracle19c, Performance Tuning.
trackback
When it comes to Automatic Indexes, things can become particularly interesting when dealing with data skew (meaning that some columns values are much less common than other column values). The next series of blog posts will look at a number of different scenarios in relation to how Automatic Indexing works with data that is skewed and not uniformly distributed.
I’ll start with a simple little example, that has an interesting little twist at the end.
The following table has a CODE column, which has 10 distinct values that a widely skewed, with some values much less common than others:
SQL> create table bowie_skew (id number,
code
number, name varchar
2
(
42
));
Table created.
SQL> insert into bowie_skew select rownum,
10
,
'DAVID BOWIE'
from dual connect by
level
<=
1000000
;
1000000
rows created.
SQL> update bowie_skew set
code
=
9
where mod(id,
3
) =
0
;
333333
rows updated.
SQL> update bowie_skew set
code
=
1
where mod(id,
2
) =
0
and id between
1
and
20000
;
10000
rows updated.
SQL> update bowie_skew set
code
=
2
where mod(id,
2
) =
0
and id between
30001
and
40000
;
5000
rows updated.
SQL> update bowie_skew set
code
=
3
where mod(id,
100
) =
0
and id between
300001
and
400000
;
1000
rows updated.
SQL> update bowie_skew set
code
=
4
where mod(id,
100
) =
0
and id between
400001
and
500000
;
1000
rows updated.
SQL> update bowie_skew set
code
=
5
where mod(id,
100
) =
0
and id between
600001
and
700000
;
1000
rows updated.
SQL> update bowie_skew set
code
=
6
where mod(id,
1000
) =
0
and id between
700001
and
800000
;
100
rows updated.
SQL> update bowie_skew set
code
=
7
where mod(id,
1000
) =
0
and id between
800001
and
900000
;
100
rows updated.
SQL> update bowie_skew set
code
=
8
where mod(id,
1000
) =
0
and id between
900001
and
1000000
;
100
rows updated.
SQL> commit;
Commit complete.
I’ll collect statistics on this table, but explicitly NOT collect histograms, so that the CBO will have no idea that the data is actually skewed. Note if I collected data with the default size, there would still be no histograms, as the column has yet to be used within an SQL predicate and so has no column usage recorded.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>
'BOWIE_SKEW'
, estimate_percent=>
100
, method_opt=>
'FOR ALL COLUMNS SIZE 1'
);
PL/SQL procedure successfully completed.
We can clearly see that some CODE values (such as “6”) have relatively few values, with only 100 occurrences:
SQL> select
code
, count(*) from bowie_skew group by
code
order by
code
;
CODE COUNT(*)
---------- ----------
1
10000
2
5000
3
1000
4
1000
5
1000
6
100
7
100
8
100
9
327235
10
654465
As I explicitly collected statistics with SIZE 1, we currently have NO histograms in the table:
SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name=
'BOWIE_SKEW'
;
COLUMN_NAME NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID
1
NONE
CODE
1
NONE
NAME
1
NONE
Let’s now run the following query with a predicate on CODE=6, returning just 100 rows:
SQL> select * from bowie_skew where
code
=
6
;
100
rows selected.
Execution Plan
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
|
1
| PX COORDINATOR | | | | | |
|
2
| PX SEND QC (RANDOM) | :TQ
10000
|
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
|
3
| PX BLOCK ITERATOR | |
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
|*
4
| TABLE ACCESS STORAGE FULL| BOWIE_SKEW |
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4
- storage(
"CODE"
=
6
)
filter(
"CODE"
=
6
)
Statistics
----------------------------------------------------------
6
recursive calls
0
db
block
gets
3781
consistent gets
0
physical reads
0
redo size
2796
bytes sent via SQL*Net to client
654
bytes received via SQL*Net from client
8
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
100
rows processed
The CBO has no choice but to use a FTS as I currently have no indexes on the CODE column. Note also that the CBO has got its cardinality estimates way wrong, expecting 100,000 rows and not the actual 100 rows, as I have no histograms on the CODE column.
So let’s now wait 15 minutes or so and see what the Automatic Indexing process decides to do. Following are portions of the next Auto Indexing report:
INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
--------------------------------------------------------------------------
| Owner | Table | Index | Key | Type | Properties |
--------------------------------------------------------------------------
| BOWIE | BOWIE_SKEW | SYS_AI_
7
psvzc
164
vbng | CODE | B-TREE | NONE |
--------------------------------------------------------------------------
VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID : fn
4
shnphu
4
bvj
SQL Text : select * from bowie_skew where
code
=
6
Improvement Factor :
41.1
x
Execution Statistics:
-----------------------------
Original Plan Auto Index Plan
---------------------------- ----------------------------
Elapsed Time (s):
119596
322
CPU Time (s):
100781
322
Buffer Gets:
11347
103
Optimizer Cost:
570
4
Disk Reads:
0
0
Direct Writes:
0
0
Rows Processed:
100
100
Executions:
1
1
So we can see that yes, Auto Indexing has decided to create a new index here on the CODE column (“SYS_AI_7psvzc164vbng“) as it improves the performance of the query by a factor of 41.1x.
If we look further down the Auto Indexing report and compare the execution plans:
PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value :
3374004665
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | | | |
570
| |
|
1
| PX COORDINATOR | | | | | |
|
2
| PX SEND QC (RANDOM) | :TQ
10000
|
100000
|
2000000
|
570
|
00:
00:
01
|
|
3
| PX BLOCK ITERATOR | |
100000
|
2000000
|
570
|
00:
00:
01
|
|
4
| TABLE ACCESS STORAGE FULL | BOWIE_SKEW |
100000
|
2000000
|
570
|
00:
00:
01
|
-----------------------------------------------------------------------------------------
- With Auto Indexes
-----------------------------
Plan Hash Value :
140816325
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
100
|
2000
|
4
|
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW |
100
|
2000
|
4
|
00:
00:
01
|
| *
2
| INDEX RANGE SCAN | SYS_AI_
7
psvzc
164
vbng |
100
| |
3
|
00:
00:
01
|
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
*
2
- access(
"CODE"
=
6
)
Notes
-----
- Dynamic sampling used for this statement (
level
=
11
)
We can see that new execution plan indeed uses the index BUT interestingly, it has a correct cardinality estimate of 100 and not 100,000 as per the original plan.
Now this can be explained in that the Automatic Indexing process uses a Dynamic Sampling level of 11, meaning it can calculate the correct cardinality on the fly and can cause difficulties between what the Automatic Indexing process thinks the CBO costs will be vs. the CBO costs in a default database session that uses the (usually default) Dynamic Sampling level of 2 (as I’ve discussed previously).
BUT when I now rerun the SQL query again:
SQL> select * from bowie_skew where
code
=
6
;
100
rows selected.
Execution Plan
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
100
|
2000
|
4
(
0
)|
|
1
| PX COORDINATOR | | | | |
|
2
| PX SEND QC (RANDOM) | :TQ
10001
|
100
|
2000
|
4
(
0
)|
|
3
| TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW |
100
|
2000
|
4
(
0
)|
|
4
| BUFFER SORT | | | | |
|
5
| PX RECEIVE | |
100
| |
3
(
0
)|
|
6
| PX SEND HASH (BLOCK ADDRESS) | :TQ
10000
|
100
| |
3
(
0
)|
|
7
| PX SELECTOR | | | | |
|*
8
| INDEX RANGE SCAN | SYS_AI_
7
psvzc
164
vbng |
100
| |
3
(
0
)|
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8
- access(
"CODE"
=
6
)
Statistics
----------------------------------------------------------
12
recursive calls
0
db
block
gets
103
consistent gets
0
physical reads
0
redo size
2796
bytes sent via SQL*Net to client
654
bytes received via SQL*Net from client
8
SQL*Net roundtrips to/from client
2
sorts (memory)
0
sorts (disk)
100
rows processed
We notice the new Automatic Index is now used BUT also that the CBO has now determined the correct cardinality estimate of 100. But how is this possible when I haven’t recalculated the table statistics?
I’ll explain in my next post.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK