On 11 Apr 2013 09:02, dale janus wrote:
You probably explained it. I may have orphaned the lock if I tried
to cancel the CHGPF job.
If indeed a lock was orphaned, one that is not visible via WRKOBJLCK,
then it was almost surely due to a defect in the OS code.
I seem to recall that it was deemed acceptable for the non-commit DB
recovery code path to leave its normal object locks pending a recovery
being initiated for the interrupted request, such that only invocation
terminations that also ended the job would have those locks implicitly
dropped. IIRC that effect was due to there being no /invocation exit/
established for that code path. Without such a /cancel handler/ being
established, only a request that either successfully ran to completion
or failed due to handled exceptions, would back out those locks. Thus
ENDRQS and unmonitored exceptions that caused termination of the request
could leave any locks that had been obtained; acceptably. However any
non-standard types of locks, i.e. locks other than the object and data
locks, such as SLLs, were supposed to be /protected/ from EndRqs,
specifically to ensure that they could not be orphaned due to a user
request to end the invocation.
I really don't remember if I let it time out or not.
But if the CHGPF request had failed instead due to timing out trying
to obtain all of the necessary locks, then as a /normal/ and monitored
failure, the code should have dropped any locks the processing had been
able to obtain before backing-out its attempt at forward progress.
Another job should not encounter any conflicting locks for its requests
against the file, if the job requesting the CHGPF had failed solely due
to its inability to allocate the file; i.e. failed due to CPF3202 or
CPF3203 being issued as the error, per a timeout for the CHGPF request
obtaining the necessary locks to proceed with its work.
Probably if I had signed off from the session that ran the CHGPF,
the lock would have been released.
Yes. Although if the situation could be recreated on a test file,
whereby the conflicting lock is not visible on WRKOBJLCK, then that is
probably a defect that can be reported.
Or if I would have changed the heading using SQL or the database part
of ops navigator, but green screen commands die hard.
I think only performing the change operation under commitment control
would have changed the outcome; the SQL requested with no isolation
would perform effectively the same request as the CHGPF SRCFILE(named)
when that source changes just column headings. That is presumed, solely
due to the different implementation, for how locks are registered and
removed in the commit vs non-commit code paths for database recovery.
That leaves only LABEL ON to effect the request [under commitment
control] because an SQL ALTER request does not give the option to change
the column labels like the request to CHGPF SRCFILE(specified) does.
When a termination occurs and the work has been registered under
commitment control, then the locks that had been obtained are dropped as
part of the explicit or implicitly rolled back [ROLLBACK] for an
interrupted request, or when the successfully completed operation is
eventually committed [COMMIT].
We are running V7R1 and applied latest cum a few weeks ago.
The condition may be easy to recreate in a test environment; one that
could use jobs that would not need to be the web interface, but might
mimic what the web interface did. Such a recreate scenario could be
submitted as a defect report to the service provider, in expectation of
an APAR and PTF from IBM. Getting a PTF as preventive sure beats
encountering the problem again, and could save others the same hassle.
I am still concerned that WRKOBJLCK did not show the problem,
As would I be... and likely indicative of a defect with the OS.
If the origin was a conflict with a held\orphan SLL, the nature of
SLLs, as I recall, would not allow presentation via WRKOBJLCK very
easily nor especially at all efficiently. A Space Location Lock is on
any space, and is not specific to an /object/ as an allocated resource.
And as I recall they are easiest obtained from the job, and why they are
available via the Retrieve Job Locks (QWCRJBLK) API [and similar] based
on MATPRLK but not via the List Object Locks (QWCLOBJL) API based on
MATOBJLK. As noted in my earlier reply, I believe iNav has an interface
to show SLLs that are held, perhaps also showing waiters, though most
likely in an interface requesting for information about one or more
/job/ vs an requesting information about an /object/ ; i.e. a /job/
interface vs an /object/ interface.? Anyhow...
An option to materialize a list of SLLs using a base address of any
space object type as input would be nice. As it is, the specific
address with offset [the specific location] must be requested to inquire
of a list of any active holders. Otherwise all processes would have to
be materialized for all of their held SLLs, and then paring down the
list of addresses to those that share the base addresses of interest.
If that were available via the LIC, then the database could inquire of
all of the base addresses of its various space objects that make up the
composite object of the database *FILE [for a request from the Work
Control feature (WC) via WRKOBJLCK], to present the effects on an
object-basis.
but I can understand it now due to the odd nature of my problem.
Odd, as in, likely defect. Not as in /understand/ that there is
something that was done wrong; just that what was done, if EndRqs, might
validly leave locks, but would not /validly/ leave locks that are not
visible from WRKOBJLCK [based on my recollection of design intent for
the OS database feature (DB)]
Were there any errors preceding the -913 in those jobs getting the
SQL0913? Any such messages could assist to find the origin; e.g.
MCH5804 "Lock space location operation not satisfied ..." vs MCH5802
"Lock operation for object &1 not satisfied" clearly diagnoses that what
type of lock was origin for the conflict. The failing instructions
identify exactly the code that requested the lock, and the code path in
which that lock request is could make the reason the lock was requested
very conspicuous; e.g. a preceding test in the OS code that says "if the
mutex-like indicator is set, then request a read-SLL to ensure not to
proceed until the SLL can be obtained" could be very revealing as to origin.
While I had suggested in my earlier reply that OPEN is unaffected by
pending recovery, I seem to recall that an open by the SQL might have a
protocol for delaying an open pending completion of certain
identified-as /exclusive/ work for which a member or data lock might not
be held to prevent the open, but work for which the SQL should probably
await completion. And I suppose that exclusivity might have been
implemented via a flag in the file [as a space, it can be changed
irrespective of locking], which as an effective mutex informing the SQL
that it must await completion of some changes, and perhaps that was
implemented via Space Location Locks [SLL]; i.e. that location would
have been locked by the CHGPF requester, an SLL obtained, then the SQL
open would await a lock on that location if the exclusive-work flag was
set. I seem to recall that some easy action would reset the flag in
situations where the flag was improperly left on... perhaps something
like DSPFD.?
FWiW here is a v6 issue describing the /change file/ interface as an
example of the OS DB leaving an orphan lock. That example involved
referential integrity, where the orphaned lock was left on the parent
file vs the child file with the dependent data; no mention of the type
of lock that was orphaned:
http://www.ibm.com/support/docview.wss?uid=nas3bd0dcd2b5f3164e28625772a0073bbb3
4refOnly:
http://www.google.com/search?q=%22space+location+lock%22+sql0913+OR+%22-913%22+OR+%22msgsql0913
As an Amazon Associate we earn from qualifying purchases.