That the conflicting lock which caused the failure may no longer be
held by the time a human investigates, is only one possible outcome.
The conflicting lock that transpired in the past may not be the same
type of lock held nor the same holder in the present, even if /present/
is the moment near instantly after the failure; i.e. later investigation
may see a lock that was not an issue when the original conflict resulted
in the timeout. Thus a request to WRKOBJLCK OUTPUT(*PRINT) as response
to the timeout, will only be able to log the possible lock holder(s)
that may pose a potential conflict at that instant, but not necessarily
those conflicts that existed only an instant prior. So any attempt to
understand what lock was in conflict must take into consideration, that
locks are transient.
The better option is usually to make a process wait longer to obtain
a lock [e.g. longer DFTWAIT(), WAIT(), or WAITFILE() depending on the
type of lock issue], and/or make the process more robust by retrying the
failed operation several times before any notification is sent about the
issue. Simply extending the wait times is often sufficient to decrease
the number of failures from locking conflicts.
FWiW some attempts to locate locking conflicts are proactive reviews
[of a subset] of jobs looking for extended LCKW status, from which the
retrieved list of locks includes those which are WAITing versus HELD. A
retry option could be coded to reactively review instead, by including
between the failure and the retry, a request to submit a background job
to /watch my job for LCKW/ to catch the actual locks and holders that
are in conflict; if the timeout occurs, the failing job or reviewer can
obtain the extracted lock information from the background job.
Note: Some messages were updated to list one job which was holding a
conflicting lock when the lock timeout occurred. IIRC that information
was added to MCH5802 and MCH5804. As such some lock requests that
either leave the MCH message in the joblog or for which that data is
then extracted and contained in the function-specific message, a
possible origin is logged.
Doug Palme wrote:
We have several processes that spawn during the course of the night
from a CL called nightjob. If a process needs a file that is locked,
the program will send a page to whoever is on call so they can
determine if the program that has the lock errored out or if it can
This is where we have a problem......a lot of times this lock is at
a file level, which is not logged in the job log....And by the time
the programmer on call gets up and checks, the lock is gone......
We thought about adding WRKOBJLCK to the CL in order to trap it and
thus write it to the job log......but I wanted to ask the group for
any other ideas on how to figure this out. The problem is, it wakes
someone up and we are hoping to determine which process has it locked
(there are in the range of 300,000 processes that fire throughout
the night) and fix it so it will stop setting off blackberry.