![](/style/images/good.png)
![](/style/images/bad.png)
Automatic Indexing: JSON Expressions Part II (“Without You I’m Nothing”
source link: https://richardfoote.wordpress.com/2022/04/14/automatic-indexing-json-expressions-part-ii-without-you-im-nothing/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Automatic Indexing: JSON Expressions Part II (“Without You I’m Nothing” April 14, 2022
Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Column Statistics, High Frequency Statistics Collection.
trackback
In my previous post on Automatic Indexing (AI) and JSON expressions, I mentioned two additional issues with the execution plan generated after the automatic indexes were created:
SQL> select * from bowie_json where json_value(bowie_order,
'$.PONumber'
)=
'42'
;
Execution Plan
----------------------------------------------------------
Plan hash value:
832017402
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
------------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
20000
|
12
M |
1524
(
1
) |
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON |
20000
|
12
M |
1524
(
1
) |
00:
00:
01
|
|*
2
| INDEX RANGE SCAN | SYS_AI_ayvj
257
jd
93
cv |
8000
| |
3
(
0
) |
00:
00:
01
|
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2
- access(JSON_VALUE(
"BOWIE_ORDER"
/*+ LOB_BY_VALUE */
FORMAT OSON ,
'$.PONumber'
RETURNING
VARCHAR
2
(
4000
) ERROR ON ERROR NULL ON EMPTY)=
'42'
)
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
234168
consistent gets
200279
physical reads
0
redo size
1595
bytes sent via SQL*Net to client
526
bytes received via SQL*Net from client
3
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
1
rows processed
The first issue is in relation to the Cost and the estimated cardinality (Rows) of the generated plan.
The execution plan has a Cost of 1524 but most importantly, an estimate cardinality of 20,000 rows. As only 1 row is actually returned, this row estimate is way way off and so therefore are the associated costs. This could potentially result in an inefficient plan and with the index not being used by the CBO.
The 20,000 row estimate comes from it being 1% of the number of rows (2 million) in the table. As I’ve discussed previously (as in this rather humourous post), the issue here is that the CBO has no idea what the expected cardinality might be, as the output from the JSON expression is effectively a black box.
Oracle generates virtual columns for this purpose, to capture column statistics that gives the CBO an accurate idea on the selectivity of expression based predicates.
But, if we look at the column statistics after the generation of the automatic indexes:
SQL> select table_name, column_name, num_distinct, virtual_column from user_tab_cols where table_name=
'BOWIE_JSON'
;
TABLE_NAME COLUMN_NAME NUM_DISTINCT VIR
---------- -------------------------------------------------- ------------ ---
BOWIE_JSON ID
1996800
NO
BOWIE_JSON BOWIE_DATE
1
NO
BOWIE_JSON BOWIE_ORDER
0
NO
BOWIE_JSON SYS_IME_OSON_B
33
B
6
E
113
C
404
F
2
EBF
6648
C
01
F
605
C
8
D
1
YES
BOWIE_JSON SYS_NC
00005
$ YES
BOWIE_JSON SYS_NC
00006
$ YES
We notice that the two virtual columns generated for the JSON based expressions have no statistics. This is because we have yet to collect new statistics (or statistics specifically on hidden columns) since the creation of the automatic indexes. As a result, the CBO still has no idea on how many rows are estimated to come from the JSON based expressions and so has to resort to the (entirely wrong) 1% estimate.
The good news with Exadata environments (which of course includes the Autonomous Database environments), is that Oracle has the High Frequency Statistics Collection capability, which will automatically collect these missing statistics after a small (configurable) period of time. I’ve previously discussed High Frequency Statistics Collection here.
So if I just wait approximately 15 minutes in my “Exadata” environment and check out the columns statistics again:
SQL> select table_name, column_name, num_distinct, virtual_column from user_tab_cols where table_name=
'BOWIE_JSON'
;
TABLE_NAME COLUMN_NAME NUM_DISTINCT VIR
---------- -------------------------------------------------- ------------ ---
BOWIE_JSON ID
1996800
NO
BOWIE_JSON BOWIE_DATE
1
NO
BOWIE_JSON BOWIE_ORDER
0
NO
BOWIE_JSON SYS_IME_OSON_B
33
B
6
E
113
C
404
F
2
EBF
6648
C
01
F
605
C
8
D
1
YES
BOWIE_JSON SYS_NC
00005
$
2000000
YES
BOWIE_JSON SYS_NC
00006
$
1996800
YES
We can see that the missing statistics have now been populated and the CBO can now accurately determine that these virtual columns are effectively unique.
If we now re-run the queries again, e.g.:
SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=
4242
;
Execution Plan
----------------------------------------------------------
Plan hash value:
1921179906
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
------------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1
|
669
|
4
(
0
) |
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON |
1
|
669
|
4
(
0
) |
00:
00:
01
|
|*
2
| INDEX RANGE SCAN | SYS_AI_gpdkwzugdn
055
|
1
| |
3
(
0
) |
00:
00:
01
|
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2
- access(JSON_VALUE(
"BOWIE_ORDER"
/*+ LOB_BY_VALUE */
FORMAT OSON ,
'$.PONumber'
RETURNING
NUMBER ERROR ON ERROR NULL ON EMPTY)=
4242
)
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
233150
consistent gets
200279
physical reads
0
redo size
1599
bytes sent via SQL*Net to client
526
bytes received via SQL*Net from client
3
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
1
rows processed
We can see that the CBO has now correctly estimated that just 1 row is to be returned and the associated CBO cost has reduced to just 4 (down from 1524) as a result.
So if you create a function-based index, make sure the generated virtual column (whether created automatically or if manually generated before the associated index) has the necessary statistics.
In the upcoming days, I’ll discuss the remaining issue associated with this plan…
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK