In-Line Views versus Correlated Sub-queries – tuning adventures in a Data Wareho...
source link: https://mikesmithers.wordpress.com/2017/09/30/in-line-views-versus-correlated-sub-queries-tuning-adventures-in-a-data-warehouse-report/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
In-Line Views versus Correlated Sub-queries – tuning adventures in a Data Warehouse Report
Events have taken a worrying turn recently. I’m not talking about Kim Jong Un’s expensive new hobby, although, if his parents had bought him that kite when he was seven…
I’m not talking about the UK’s chief Brexit negotiator David Davies quoting directly from the Agile Manifesto and claiming that the British were “putting people before process” in the negotiations, although Agile as a negotiating strategy is rather…untested.
I’m not even talking about the sunny Bank Holiday Monday we had in England recently even though this may be a sign of Global Warming ( or possibly a portent for the end of days).
The fact is, we have an Ashes series coming up this winter and England still haven’t managed to find a top order that doesn’t collapse like a cheap deckchair in a light breeze.
On top of that, what started out as a relatively simple post – effectively a note to myself about using the row_number analytical function to overcome a recent performance glitch in a Data Warehouse Application – also seems to have developed an unexpected complication…
The Application
I’ve taken the liberty of stripping the application down to it’s essentials for illustrative purposes.
The application accepts a bulk upload of data into a table called INTERESTING_TIMES :
The DDL for our table looks like this :
create
table
interesting_times
(
id number,
worry varchar2(4000),
record_type varchar2(1)
)
/
Users may then amend individual records via some kind of UI which ultimately calls :
create
or
replace
procedure
adjust_interesting_times(
i_id interesting_times.id%type,
i_worry interesting_times.worry%type)
as
begin
merge
into
interesting_times
using dual
on
( id = i_id
and
record_type =
'A'
)
when
matched
then
update
set
worry = i_worry
when
not
matched
then
insert
( id, worry, record_type)
values
(i_id, i_worry,
'A'
);
end
;
/
That’s right, when an amendment is made, the application does not perform an update. Instead, it creates a new record with a RECORD_TYPE of A(mended), preserving the O(riginal) record.
For reasons unknown, it’s apparently OK to update an existing A(mended) record.
Periodically, a report is run to list the latest version of each record in the table.
The report is as follows :
select
it.id, it.worry, it.record_type
from
interesting_times it
where
record_type =
nvl((
select
'A'
from
interesting_times it1
where
it1.record_type =
'A'
and
it1.id = it.id),
'O'
)
order
by
1
/
It’s saved in a file called worry_list.sql.
During application testing, there was no problem. The table was populated with test records…
insert
into
interesting_times( id, worry, record_type)
select
level
as
id,
case
mod(
level
, 3)
when
2
then
'North Korean Missile Exports'
when
1
then
'Global Warming'
else
'Brexit'
end
as
worry,
'O'
as
record_type
from
dual
connect
by
level
<= 100
union
all
select
level
,
q
'[England'
s
top
order
]
',
'
A
'
from dual
where mod(level, 100) = 0
connect by level <= 100
/
commit;
exec dbms_stats.gather_table_stats(user, '
INTERESTING_TIMES')
…and the report ran nice and quickly…
set
timing
on
@worry_list.sql
set
timing
off
...
ID WORRY R
---------- ---------------------------------------- -
97
Global
Warming O
98 North Korean Missile Exports O
99 Brexit O
100 England's
top
order
A
100
rows
selected.
Elapsed: 00:00:00.053
However, in production, data volumes are (or have grown to be) somewhat more significant…
truncate
table
interesting_times
/
insert
into
interesting_times( id, worry, record_type)
select
level
as
id,
case
mod(
level
, 3)
when
2
then
'North Korean Missile Exports'
when
1
then
'Global Warming'
else
'Brexit'
end
as
worry,
'O'
as
record_type
from
dual
connect
by
level
<= 100000
union
all
select
level
,
q
'[England'
s
top
order
]
',
'
A
'
from dual
where mod(level, 100) = 0
connect by level <= 100000
/
commit;
exec dbms_stats.gather_table_stats(user, '
INTERESTING_TIMES')
…and the application now has some performance issues…
...
ID WORRY R
---------- ---------------------------------------- -
99995 North Korean Missile Exports O
99996 Brexit O
99997 Global Warming O
99998 North Korean Missile Exports O
99999 Brexit O
100000 England's top order A
100000 rows selected.
Elapsed: 00:19:58.67
Yes, the runtime has mushroomed to almost 20 minutes and users are not happy.
The tkprof output makes it clear where the problem lies…
select it.id, it.worry, it.record_type
from interesting_times it
where record_type =
nvl((
select 'A'
from interesting_times it1
where it1.record_type = 'A'
and it1.id = it.id), 'O')
order by 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 6668 1183.16 1191.72 0 41003158 0 100000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6670 1183.17 1191.75 0 41003158 0 100000
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 48
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
100000 100000 100000 SORT ORDER BY (cr=41003158 pr=0 pw=0 time=1191452315 us cost=10216576 size=24 card=1)
100000 100000 100000 FILTER (cr=41003158 pr=0 pw=0 time=1189848194 us)
101000 101000 101000 TABLE ACCESS FULL INTERESTING_TIMES (cr=406 pr=0 pw=0 time=115893 us cost=116 size=2424000 card=101000)
1992 1992 1992 TABLE ACCESS FULL INTERESTING_TIMES (cr=41002752 pr=0 pw=0 time=1189610376 us cost=113 size=7 card=1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 6668 0.00 0.01
resmgr:cpu quantum 16 0.13 0.70
SQL*Net message from client 6668 0.02 6.32
********************************************************************************
Time to rethink our Report query…
It’s row_number(), not ROWNUM
The row_number() analytical function may well come in handy here. Consider…
select
id, worry, record_type,
row_number() over( partition
by
id
order
by
(
case
nvl( record_type,
'O'
)
when
'A'
then
1
else
2
end
))
as
recnum
from
interesting_times
where
id
between
99995
and
100000
/
ID WORRY R RECNUM
---------- ------------------------------ - ----------
99995 North Korean Missile Exports O 1
99996 Brexit O 1
99997
Global
Warming O 1
99998 North Korean Missile Exports O 1
99999 Brexit O 1
100000 England's
top
order
A 1
100000
Global
Warming O 2
7
rows
selected.
Using row_number to order records for each id, we can now eliminate the correlated sub-query from our report…
with
ilv
as
(
select
id, worry, record_type,
row_number() over( partition
by
id
order
by
case
nvl(record_type,
'O'
)
when
'A'
then
1
else
2
end
)
as
recnum
from
interesting_times
)
select
id, worry, record_type
from
ilv
where
recnum = 1
/
…and experience a stunning improvement in performance…
...
ID WORRY R
---------- ---------------------------------------- -
99994 Global Warming O
99995 North Korean Missile Exports O
99996 Brexit O
99997 Global Warming O
99998 North Korean Missile Exports O
99999 Brexit O
100000 England's top order A
100000 rows selected.
Elapsed: 00:00:07.89
Yes, you are reading that right, that’s 7.89 seconds. No I didn’t forget to flush the cache between running the original report and this new version.
Now, if that was all there was to it, this would be a rather short post. However…
Hubble, Bubble, Toil And Trouble…
What happens if we just re-write the query with an in-line view and don’t bother with analytical functions at all ?
with
latest_records
as
(
select
id,
max
(
case
record_type
when
'A'
then
2
else
1
end
)
as
recnum
from
interesting_times
group
by
id
)
select
it.id, it.worry, it.record_type
from
interesting_times it
inner
join
latest_records lr
on
lr.id = it.id
and
lr.recnum =
case
it.record_type
when
'A'
then
2
else
1
end
order
by
id
/
...
ID WORRY R
---------- ---------------------------------------- -
99994
Global
Warming O
99995 North Korean Missile Exports O
99996 Brexit O
99997
Global
Warming O
99998 North Korean Missile Exports O
99999 Brexit O
100000 England's
top
order
A
100000
rows
selected.
Elapsed: 00:00:08.06
Yep, the performance is virtually identical to using row_number().
Looking at the respective execution plans provides a clue as to why this is.
First, the row_number query :
with ilv as
(
select id, worry, record_type,
row_number() over( partition by id order by case nvl(record_type, 'O') when 'A' then 1 else 2 end) as recnum
from interesting_times
)
select id, worry, record_type
from ilv
where recnum = 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.04 0 0 0 0
Execute 1 0.00 0.01 0 0 0 0
Fetch 6668 0.58 0.63 404 406 0 100000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6670 0.59 0.69 404 406 0 100000
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 48
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
100000 100000 100000 VIEW (cr=406 pr=404 pw=0 time=403080 us cost=830 size=8080000 card=101000)
101000 101000 101000 WINDOW SORT PUSHED RANK (cr=406 pr=404 pw=0 time=361666 us cost=830 size=2424000 card=101000)
101000 101000 101000 TABLE ACCESS FULL INTERESTING_TIMES (cr=406 pr=404 pw=0 time=43133 us cost=113 size=2424000 card=101000)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 6668 0.00 0.01
db file sequential read 1 0.00 0.00
db file scattered read 19 0.01 0.02
SQL*Net message from client 6668 0.01 6.45
********************************************************************************
…and now the straight in-line view…
with latest_records as
(
select id,
max( case record_type when 'A' then 2 else 1 end) as recnum
from interesting_times
group by id
)
select it.id, it.worry, it.record_type
from interesting_times it
inner join latest_records lr
on lr.id = it.id
and lr.recnum = case it.record_type when 'A' then 2 else 1 end
order by id
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 6668 0.63 0.65 404 812 0 100000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6670 0.63 0.66 404 812 0 100000
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 48
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
100000 100000 100000 SORT ORDER BY (cr=812 pr=404 pw=0 time=421638 us cost=852 size=32320 card=1010)
100000 100000 100000 HASH JOIN (cr=812 pr=404 pw=0 time=329296 us cost=851 size=32320 card=1010)
100000 100000 100000 VIEW (cr=406 pr=404 pw=0 time=174590 us cost=466 size=806592 card=100824)
100000 100000 100000 HASH GROUP BY (cr=406 pr=404 pw=0 time=145304 us cost=466 size=705768 card=100824)
101000 101000 101000 TABLE ACCESS FULL INTERESTING_TIMES (cr=406 pr=404 pw=0 time=24266 us cost=113 size=707000 card=101000)
101000 101000 101000 TABLE ACCESS FULL INTERESTING_TIMES (cr=406 pr=0 pw=0 time=21423 us cost=113 size=2424000 card=101000)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 6668 0.00 0.01
db file sequential read 1 0.00 0.00
db file scattered read 19 0.00 0.00
asynch descriptor resize 3 0.00 0.00
SQL*Net message from client 6668 0.01 6.73
********************************************************************************
Essentially, the Optimizer “materializes” the in-line-view – i.e. it creates it as a de facto temporary table.
Turns out then that it’s actually the humle in-line view that turbo charges the query rather than the nifty row_number() analytical function.
That’s one less problem I’ve got to deal with. Now to figure out England’s top order for Brisbane. Hmmm, I may need to resort to DBMS_RANDOM…
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK