That the conflicting lock which caused the failure may no longer be held by the time a human investigates, is only one possible outcome. The conflicting lock that transpired in the past may not be the same type of lock held nor the same holder in the present, even if /present/ is the moment near instantly after the failure; i.e. later investigation may see a lock that was not an issue when the original conflict resulted in the timeout. Thus a request to WRKOBJLCK OUTPUT(*PRINT) as response to the timeout, will only be able to log the possible lock holder(s) that may pose a potential conflict at that instant, but not necessarily those conflicts that existed only an instant prior. So any attempt to understand what lock was in conflict must take into consideration, that locks are transient.

The better option is usually to make a process wait longer to obtain a lock [e.g. longer DFTWAIT(), WAIT(), or WAITFILE() depending on the type of lock issue], and/or make the process more robust by retrying the failed operation several times before any notification is sent about the issue. Simply extending the wait times is often sufficient to decrease the number of failures from locking conflicts.

FWiW some attempts to locate locking conflicts are proactive reviews [of a subset] of jobs looking for extended LCKW status, from which the retrieved list of locks includes those which are WAITing versus HELD. A retry option could be coded to reactively review instead, by including between the failure and the retry, a request to submit a background job to /watch my job for LCKW/ to catch the actual locks and holders that are in conflict; if the timeout occurs, the failing job or reviewer can obtain the extracted lock information from the background job.

Note: Some messages were updated to list one job which was holding a conflicting lock when the lock timeout occurred. IIRC that information was added to MCH5802 and MCH5804. As such some lock requests that either leave the MCH message in the joblog or for which that data is then extracted and contained in the function-specific message, a possible origin is logged.

Regards, Chuck

Doug Palme wrote:
We have several processes that spawn during the course of the night from a CL called nightjob. If a process needs a file that is locked,
the program will send a page to whoever is on call so they can
determine if the program that has the lock errored out or if it can
be killed.....

This is where we have a problem......a lot of times this lock is at a file level, which is not logged in the job log....And by the time
the programmer on call gets up and checks, the lock is gone......

We thought about adding WRKOBJLCK to the CL in order to trap it and thus write it to the job log......but I wanted to ask the group for any other ideas on how to figure this out. The problem is, it wakes someone up and we are hoping to determine which process has it locked
(there are in the range of 300,000 processes that fire throughout
the night) and fix it so it will stop setting off blackberry.

This thread ...

Return to Archive home page | Return to MIDRANGE.COM home page