|
Hi Don,
Assuming that you're on release 7.2 or later, you have a couple of options
for clamping down on runaway queries.
There are job-based controls that limit the amount of CPU or temporary
storage that a job can use. It sounds like CPU probably wasn't the
problem. It's possible, though, that the query engine had to build
temporary structures in memory to as it was processing the query, and this
may have been keeping the I/O subsystem busy. This would show up as disk
activity and as growing temporary storage usage. IBM recently enhanced the
support for job-based temporary storage limits to include storage for SQL
queries (CPU limits have always applied.) I wrote an article about it last
spring:
https://ibm.co/2BbmoQL
Another option is to use the QRYSTGLMT and QRYTIMLMT parameters on the
CHGQRYA command. This is somewhat less useful, since the limit kicks in
based on the optimizer's anticipated estimate of resource usage (as
opposed to the actual measured usage). Depending on how much the optimizer
can figure out about your data, these estimates may be very good or they
may not.
Thank you,
Tim Clark
DB2 for IBM i
From: Don Brown <DBrown@xxxxxxxxxx>attempts
To: "Midrange- L" <midrange-l@xxxxxxxxxxxx>,
Date: 11/12/2018 05:30 PM
Subject: Run-Away query stops system
Sent by: "MIDRANGE-L" <midrange-l-bounces@xxxxxxxxxxxx>
Yesterday the IBMi became totally unresponsive. It took 20 or 30
to get a console connection to the systemagain
This is a single system using lan console (No HMC) with V7R3 and very
current on PTF's
The cause of the problem was later identified that a user running a very
bad query. This was discovered when performance started to go south
and one of the IT people walked around all the departments and found auser
re-running the query that did not finish when the system failed!!subsystem.
The application has a user tool for creating adhoc queries.
When the user log on there is a batch job created in in a user
When the query runs a QZDASOINIT jobs provides the database connectionwith
user profile QUSER in QUSRWRKindexes
I checked everything I could to see what was chewing up resources and
nothing identified this job.
I believe the joining of tables in this query resulted in many new
being required over files with millions of records.govern
I would like to identify what could have been done to identify the
responsible job - any suggestions ?
The system CPU usage from wrkactjob was < 10%
Disk % busy was high between 30 - 50 %
System storage was fine at 62%
No relevant messages were on the system operator queue or history log
Displaying the history log resulted in CPD2537 Not all messages logged -
Also pointing to the high disk activity
Wrksysact did not show any abnormal jobs or any high I/O
The full version of system performance tools is not available on this
system.
How could I have identified this job using the available tools ?
While I am still doing a lot of reading I believe there is a way to
a query that if it uses more than a preset resource or exceeds a runor
duration the system can cancel the query - Has anyone implemented this
have any relevant links to how this is implemented ?
Appreciate any suggestion or advice.
Don Brown
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.