Oracle 19c Automatic Indexing: Data Skew Part III (The Good Son)
source link: https://richardfoote.wordpress.com/2020/09/16/oracle-19c-automatic-indexing-data-skew-part-iii-the-good-son/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Oracle 19c Automatic Indexing: Data Skew Part III (The Good Son) September 16, 2020
Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Index Access Path, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Unusable Indexes.
trackback
I’m going to expand just a tad on my previous posts on data skew and run a simple query that returns a few rows based on a column predicate AND another query on the same column that returns many rows.
The following table has a CODE column as with previous posts with the data heavily skewed:
SQL> create table bowie_skew (id number,
code
number, name varchar
2
(
42
));
Table created.
SQL> insert into bowie_skew select rownum,
10
,
'DAVID BOWIE'
from dual connect by
level
<=
1000000
;
1000000
rows created.
SQL> update bowie_skew set
code
=
9
where mod(id,
3
) =
0
;
333333
rows updated.
SQL> update bowie_skew set
code
=
1
where mod(id,
2
) =
0
and id between
1
and
20000
;
10000
rows updated.
SQL> update bowie_skew set
code
=
2
where mod(id,
2
) =
0
and id between
30001
and
40000
;
5000
rows updated.
SQL> update bowie_skew set
code
=
3
where mod(id,
100
) =
0
and id between
300001
and
400000
;
1000
rows updated.
SQL> update bowie_skew set
code
=
4
where mod(id,
100
) =
0
and id between
400001
and
500000
;
1000
rows updated.
SQL> update bowie_skew set
code
=
5
where mod(id,
100
) =
0
and id between
600001
and
700000
;
1000
rows updated.
SQL> update bowie_skew set
code
=
6
where mod(id,
1000
) =
0
and id between
700001
and
800000
;
100
rows updated.
SQL> update bowie_skew set
code
=
7
where mod(id,
1000
) =
0
and id between
800001
and
900000
;
100
rows updated.
SQL> update bowie_skew set
code
=
8
where mod(id,
1000
) =
0
and id between
900001
and
1000000
;
100
rows updated.
SQL> commit;
Commit complete.
I’ll next collect statistics with NO histogram, as I don’t think they’re required at this point:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>
'bowie_skew'
, estimate_percent=>
100
, method_opt=>
'FOR ALL COLUMNS SIZE 1'
);
PL/SQL procedure successfully completed.
If we look at the table data:
SQL> select
code
, count(*) from bowie_skew group by
code
order by
code
;
CODE COUNT(*)
---------- ----------
1
10000
2
5000
3
1000
4
1000
5
1000
6
100
7
100
8
100
9
327235
10
654465
The value “7” only has 100 associated rows, while the value “10” is very common with 654,465 rows.
But I currently have no histograms:
SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name=
'BOWIE_SKEW'
;
COLUMN_NAME NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID
1
NONE
CODE
1
NONE
NAME
1
NONE
If I run the following query with a CODE=7 predicate just once:
SQL> select * from bowie_skew where
code
=
7
;
100
rows selected.
Execution Plan
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
|
1
| PX COORDINATOR | | | | | |
|
2
| PX SEND QC (RANDOM) | :TQ
10000
|
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
|
3
| PX BLOCK ITERATOR | |
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
|*
4
| TABLE ACCESS STORAGE FULL| bowie_skew |
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
--------------------------------------------------------------------------------------------
It uses a Full Table Scan (the CBO has no choice without an index) AND hopelessly gets the cardinality estimate wrong, thinking 100K are going to be returned (and not the 100 actual rows). So the CBO is unlikely to use an index anyways as it would be deemed too expensive to return so many rows.
I’ll now run the following query many times on the CODE=10 predicate that returns many rows:
SQL> select * from bowie_skew where
code
=
10
;
654465
rows selected.
Execution Plan
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
|
1
| PX COORDINATOR | | | | | |
|
2
| PX SEND QC (RANDOM) | :TQ
10000
|
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
|
3
| PX BLOCK ITERATOR | |
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
|*
4
| TABLE ACCESS STORAGE FULL| bowie_skew |
100
K|
1953
K|
570
(
7
)|
00:
00:
01
|
--------------------------------------------------------------------------------------------
So again, no choice here with a FTS and we likely wouldn’t want to use an index anyways as it would be just too expensive.
If we check out what the Automatic Indexing process has done with such a workload:
SQL> select dbms_auto_index.report_last_activity() report from dual;
REPORT
INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
--------------------------------------------------------------------------
| Owner | Table | Index | Key | Type | Properties |
--------------------------------------------------------------------------
| BOWIE | BOWIE_SKEW | SYS_AI_
7
psvzc
164
vbng | CODE | B-TREE | NONE |
--------------------------------------------------------------------------
-------------------------------------------------------------------------------
VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID :
6
fm
3
m
8
cg
2
jnun
SQL Text : select * from bowie_skew where
code
=
7
Improvement Factor :
46.6
x
Execution Statistics:
-----------------------------
Original Plan Auto Index Plan
---------------------------- ----------------------------
Elapsed Time (s):
36653
1992
CPU Time (s):
33899
967
Buffer Gets:
4291
103
Optimizer Cost:
52
4
Disk Reads:
0
2
Direct Writes:
0
0
Rows Processed:
100
100
Executions:
1
1
An Automatic Index on the CODE column is created (SYS_AI_7psvzc164vbng), with ONLY the SQL based on the CODE=7 predicate listed in the report. The other query is indeed too expensive for a new index to be viable and so isn’t listed.
If we look at the Plans Section of the Automatic Indexing report:
PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value :
410492785
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | | | |
52
| |
|
1
| TABLE ACCESS STORAGE FULL | BOWIE_SKEW |
100000
|
2000000
|
52
|
00:
00:
01
|
--------------------------------------------------------------------------------------
Notes
-----
- dop_reason = no expensive parallel operation
- dop =
1
- px_in_memory_imc = no
- px_in_memory = no
- With Auto Indexes
-----------------------------
Plan Hash Value :
140816325
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
119
|
2380
|
4
|
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW |
119
|
2380
|
4
|
00:
00:
01
|
| *
2
| INDEX RANGE SCAN | SYS_AI_
7
psvzc
164
vbng |
100
| |
3
|
00:
00:
01
|
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
*
2
- access(
"CODE"
=
7
)
Notes
-----
- Dynamic sampling used for this statement (
level
=
11
)
The important point to note here is that the cardinality estimates are relatively accurate despite there being no histograms at this stage because the Automatic Indexing session uses Dynamic Sampling Level=11. Missing/inaccurate statistics are calculated on fly and this enables the session to accurately determine the size of the returned data set and that an index is indeed the more efficient access path.
So with mixed workloads, all it takes is one SQL executed once that demonstrably improves thanks to an index for the associated Automatic Index to be created as a VISIBLE/VALID index:
SQL> select index_name,
auto
, visibility, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name=
'BOWIE_SKEW'
;
INDEX_NAME AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- -------- ---------- ----------- -----------------
SYS_AI_
7
psvzc
164
vbng YES VISIBLE VALID
1000000
1537
8534
If we now run the query AFTER the histograms are subsequently created thanks to the High-Frequency Automatic Statistics Collection (see previous post), the new Automatic Index is now used:
SQL> select * from bowie_skew where
code
=
7
;
100
rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value:
140816325
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
100
|
2000
|
4
(
0
)|
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW |
100
|
2000
|
4
(
0
)|
00:
00:
01
|
|*
2
| INDEX RANGE SCAN | SYS_AI_
7
psvzc
164
vbng |
100
| |
3
(
0
)|
00:
00:
01
|
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2
- access(
"CODE"
=
7
)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
because of no expensive parallel operation
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
104
consistent gets
0
physical reads
0
redo size
2871
bytes sent via SQL*Net to client
359
bytes received via SQL*Net from client
2
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
100
rows processed
Note if the histogram is NOT yet collected, the CBO will not determine the correct cardinality estimate and will ignore the new Automatic Index (as previously discussed).
If we run again the query that returns many rows:
SQL> select * from bowie_skew where
code
=
10
;
654465
rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value:
410492785
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
654
K|
12
M|
52
(
16
)|
00:
00:
01
|
|*
1
| TABLE ACCESS STORAGE FULL| BOWIE_SKEW |
654
K|
12
M|
52
(
16
)|
00:
00:
01
|
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- storage(
"CODE"
=
10
)
filter(
"CODE"
=
10
)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
because of no expensive parallel operation
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
3725
consistent gets
0
physical reads
0
redo size
6549708
bytes sent via SQL*Net to client
1790
bytes received via SQL*Net from client
132
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
654465
rows processed
The new Automatic Index is correctly ignored by the CBO, as the query returns too many rows for the index to be viable.
So in this example, Automatic Indexing works exactly as it should. It creates a new Automatic Index for a query where it will indeed improve the performance, while other queries on the same column in which many more rows are returned are also run. For these other queries, the new Automatic Index is correctly not used as such an index would degrade the performance of the query.
In my next post, I’ll look at the first example with data skew where Automatic Indexing can be problematic…
Comments»
No comments yet — be the first.
Leave a Reply Cancel reply
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK