On 14-Mar-2015 08:59 -0500, Darryl Freinkel wrote:
I have a complex SQL script that used to run in about 3 minutes and
processed through 700k records.
I made a bunch of changes and one of the changes has resulted in the
run time going out to 12 hours.
May be great value in obtaining the pre-change copy of the script and
comparing\running each of them step-by-step to compare\contrast.
Quite common for such a radical change is multiple-file queries for
which a predicate between files is [since] missing; that omission having
resulted in a cross-product result. Another issue [and possibly in
combination] is a change that resulted in the Classic Query Engine (CQE)
performing the query instead of the SQL Query Engine (SQE), and the
particular query likely is lacking an efficient SQL-like implementation
available from the classic optimizer which was not designed originally
with an intention solely to implement SQL queries. Likely resolutions
are to avoid whatever operations require a fallback on CQE [e.g.
shared-weight sequences for collation and selection] or [for joins
anyhow] to provide the query more if even seemingly redundant predicates
[or statistics if running SQE] to better limit selection.
<
http://www.ibm.com/support/knowledgecenter/api/content/ssw_i5_54/rzajq/queryoptimize.htm>
_Query Engine Overview_
I have all the advised indexes required by the system.
Advised indexes are those which the optimizer presumes would be
useful for the queries being run. Additional indexes on the data can
provide statistics about the data that may yield alternate preferences,
thus changes to what had previously been advised; i.e. having only\all
the advised indexes does not imply that the ideal keyed access paths
exist for implementing queries. Adding RI and Check Constraints provide
even more information than just statistics; known logical constraints on
the data can narrow focus to specific data rather than all indexed data.
I am trying to monitor system usage and found the following:
- CPU utilization is constantly around 40% (2 hours)
- Disk IO across all 10 drives is 1% or less (1 hours)
On the face of it, it appears the system is idle whilst it is
processing the data.
Unlikely /idle/, more likely just not visible at the job level. The
Work With System Activity (WRKSYSACT) command shows LIC tasks that are
not visible via the Work With Active Jobs (WRKACTJOB) interface. Unless
all of that 40% was known to be taken by other work than the
long-running query, a noticeable portion of that was probably the query.
Looking at the job status, for status transitions [in WRKACTJOB],
could be easy and more telling than much else, if indeed there were any
/wait/ conditions; e.g. something like DEQW, LCKW, xxxW, then those are
visible there. If the job is not changing status, and for example
remains continually in RUN status, then either work is progressing, or
any waits are not those exposed outside of the LIC; the long-running
database operations are typically exposed in the job status as IDX for
building an access path or as RUN for selecting and reading data
records. If the program stack appears to be unchanging, then a LIC dump
of the process will show the effective stack of the LIC processing.
Does anyone have any advice on how to track this down.
My first pass is usually just to look at the queries to decide which
are the most likely to be difficult to optimize and\or complete, and
then run and investigate those separately. However running the script
with the query timeout option [the query governor] set and ensuring
inquiry messages are prompted, allows the optimizer to reveal prior to
each query to be run, which are presumed to take longer than a specified
limit or an option allows forcing the optimizer to reveal the
presumptions of the query costing, irrespective of any limit. See
Change Query Attributes (CHGQRYA) for the Query Time(out) Limit
(QRYTIMLMT) parameter; the value zero suggests to stop every request and
inform of the cost before a reply allows continuing. That feature also
enables similar limits, for query temporary storage requirements, via
the Query Storage Limit (QRYSTGLMT) parameter. See the following, along
with the related and parent links at the bottom of the page that refer
to what also might be of interest:
<
http://www.ibm.com/support/knowledgecenter/api/content/ssw_i5_54/rzajq/testperfwithqrygov.htm>
_Testing performance with the query governor_
If not separating each query with the forced timeout inquiry, I might
insert a CALL to a routine that logs the current timestamp along with
something to identify which statement is being run; if only to get
timings for each, reviewed just once or separate runs compared to see
any [lack of] consistency. Because I prefer to see the job traces, I
might surround each request with requests to start and end a trace;
later print the trace information for the request(s) that are suspect as
poor-performers.
With the old script having the same pre\post processing inserted
around the statements to effect logging as were included in the updated
script, the timing differences on a per-statement basis are easier to
fathom. Other tooling like the DB Monitor should assist in the same
way, but I generally prefer my own logging. For example, one script
might have combined two queries into one; with my own logging I may be
able to logically group the /same/ work easier than can be done with
other tooling. If the script is really long vs just a few queries, I
might run the old and new concurrently side-by-side instead of using any
logging.
We are on V5r4 so no IBM support?
AFaIK individual support contracts are available to be purchased,
even when a release since has had general support ended. There are also
non-IBM service providers that presumably could similarly assist.
Do you know how I can use the data base monitor to analyze what is
going on?
There are some /canned queries/ documented that can be run on the
DBMON output to make certain inquiries about a specific query or a batch
of queries. I have seen some published on the web, perhaps in articles
at one of the news sites; some may be included in the iNav database
feature, I do not recall. As I recall there was a simple way to query
which statements were run via CQE vs SQE. A really old document that
has some information that may assist, followed by links to some
progressively newer documents [which may have /related topics/ links
included]:
<
http://www.redbooks.ibm.com/redpapers/pdfs/redp0502.pdf> ??-???-2000
Using AS/400 Database Monitor and Visual Explain To Identify and Tune
SQL Queries
<
http://www.redbooks.ibm.com/abstracts/sg246092.html> 16-Sep-2004
DB2 Universal Database for iSeries Administration: The Graphical Way on V5R3
<
http://www.redbooks.ibm.com/abstracts/sg246654.html> 26-May-2006
SQL Performance Diagnosis on IBM DB2 Universal Database for iSeries
<
http://www.redbooks.ibm.com/abstracts/tips0624.html> 08-Aug-2006 upd:
12 August 2008
Index Advisor Surfaces Maintained Temporary Indexes (MTI) Activity on
DB2 for i5/OS
<
http://www.redbooks.ibm.com/abstracts/sg246598.html> 27-Sep-2006
Preparing for and Tuning the SQL Query Engine on DB2 for i5/OS
<
http://www.redbooks.ibm.com/abstracts/sg247326.html> 12-Mar-2007
OnDemand SQL Performance Analysis Simplified on DB2 for i5/OS in V5R4
Why can I not see where the processing is? It's almost like there is
a wait happening?
The easy answer is "the lack of knowing [best] where to look".
Hopefully some of the above will assist. Having worked on the DB2 for
i, I tend to rely much more on what I know should happen, without much
reliance on the various feedback mechanisms beyond debug messages and
maybe VE, paired with my knowledge of both the actual data and the DDL
that defined the files, the relations [including RI] and indexes... plus
consideration for anything extra like read triggers, fieldprocs, or UDFs
that might inhibit quick access.
As an Amazon Associate we earn from qualifying purchases.