From my experience, there is no one answer to this question (or there is,
but it's "It depends"). Sometimes CTEs will outperform temporary table
scheme, other times it won't.
It's all in the access plan and query optimizer's access to good statistics.
If you have abundance of good indexes for QO to use, CTEs are likely to
outperform temp table scheme. If not, it's a gamble.
Fact is that query optimizer retrieves some stats from the physical file
object as well, so when you create a temporary table that's one additional
source of statistics for it (i.e. number of rows, a VERY important
statistic).
Your chances of getting a correct answer are much better if you run Visual
Explain and compare two plans as to what steps they're taking. This can
bring an "aha!" moment.
Without looking at some dbmon or visual explain data I can only guess that
there aren't enough good indexes over base data for QO to make accurate
estimates, causing it to build a poorly performing plan.
Two reasonable courses of action are building more "perfect" indexes hoping
it'll cause QO to use them and build a different access plan OR rewrite the
statement and hope that'll make for the better access plan.
HTH
Elvis
2007 System i Fall Technical Conference | Orlando | November 4-7
Celebrating 10-Years of SQL Performance Excellence on IBM i5/OS and OS/400
-----Original Message-----
Subject: Question for the query engine experts
I'm wondering if it's considered normal for a complex query with multiple
CTE to work fast when broken
up into multiple queries that output to work files.
An example:
with t1 as (select ... from table1 ...)
,t2 as (select ... from table2 ...)
, t3 as(select ... from t1 .... UNION select ... from t2 ...)
, t4 as (select .... from table3 ... left outer join table4 ... left
outer join table4 )
, t5 as (select ... from t4 UNION ALL ... select ... from t4 UNION ALL
... select ... from t4)
select ... from ( t5 inner join t3 ...) left outer join table5 .... left
outer join table6 ... left
outer join table7
works much faster when I break the query into three parts with two physical
temporary files:
with t1 as (select ... from table1 ...)
,t2 as (select ... from table2 ...)
select ... from t1 .... UNION select ... from t2 ...
---> output to tmptable3
with t4 as (select .... from table3 ... left outer join table4 ... left
outer join table4 )
select ... from t4 UNION ALL ... select ... from t4 UNION ALL ... select ...
from t4
---> output to tmptable5
select ... from ( tmptable5 inner join tmptable3 ...) left outer join table5
.... left outer join
table6 ... left outer join table7
It would seem to me that the query engine should be faster at creating
virtual tables than writing to
physical tables.
In fact, the CPU processing time is close for the two methods:
1 query - 209 seconds
3 querys w/tmp files - 225 seconds.
But clock time was quite a bit different:
1 query - 90min
3 querys w/tmp files - 20min.
All the queries were run via Query Manager as batch jobs on a v5r2 box.
Thoughts or comments from anyone as to why physical temporary tables were
quicker performing than
embedded CTE's?
If this always the case? Or does the CTE version possibly need indexes that
the physical temporary
table one doesn't?
Thanks!
Charles Wilt
As an Amazon Associate we earn from qualifying purchases.