On 23-Mar-2016 08:36 -0500, Steinmetz, Paul wrote:
What would cause a MCH5802
"Lock operation for object TRACE TRACE not satisfied."
Specifically for the symptom msgMCH5802 F/RmslLockMiObject T/QDBOPEN
TP/QDBOPEN STMT/19483 [obtained\constructed from joblog info included
later], that would be a failing MI LOCK instruction [per timeout] for
either the member, data[space], or access path with that name; i.e.
essentially, for the member TRACE of database file TRACE. The OS
msgCPF4126 RC1 F/QDBSIGEX clarifies that the object unable to be
allocated was the keyed Access Path of the data for that member.
Notably, from the help text for the Access Path Recovery (RECOVER)
keyword for the Create Physical File (CRTPF), "While the access path is
being rebuilt, the file cannot be used by any job."; more appropriately,
the keyed access *open* of the *member* of the file can not be performed
by any *other* [thread or] job than the job performing the rebuild.
From the details in the remainder of the topic, in which I further
comment inline, I infer that is why the open did not transpire in a
second\the-failing job.
I had two occurrences last night.
Both were following RSTLIBBRM(s) of test environment.
One job 811697 opened the same file 63 seconds earlier with no
issue.
According to the message details, the first job, 811697, which was
successful, was the lock holder.
Checking the joblog for the first job, 811697, I see it took 4
minutes, 32 seconds for the program to open its 4 files.
Normally this is 1 second.
Date sent . . . : 03/23/16 Time sent . . . : 02:29:56
Date sent . . . : 03/23/16 Time sent . . . : 02:34:28
I'm thinking it could be related to an access path being rebuilt
following the RSTLIBBRM.
What is puzzling is I don't see a CPF3145 Access path built for
member for the file in question.
I do see many other CPF3145.
The other msgs CPF3145 were seen where? In the joblog of the
restore, in QSYSOPR, or in QHST via DSPLOG?
Presumably, the job 811697 that opened the member first, effected the
AccPth rebuild activity. I expect possibly that the msg CPF3145
T/QSYSOPR is not an expected effect; perhaps that the message is sent,
but only T/QHST. Either way, I am confident the message is not sent to
the joblog of the opener. The CPF3145 messages seen, I suspect likely
were sent from a QDBSRV## runpty-52 job, and were sent To *SYSOPR, for
which an implicit copy also would have been sent To QHST. And depending
on the outcome for the opener, the msg CPF3146 may have been the effect
instead; albeit unlikely, as the program as opener; i.e. the first job
likely would have failed also, but with a different open error.?
Anyhow, the presumed-successful opener of the member [with the keyed
access open method] apparently won the race to effect the rebuild; in a
race first against the Database Server jobs, to which the rebuild-work I
expect would have been added asynchronously via an event as a side
effect of the restore\load-data recognizing the need to rebuild, and
then eventually that rebuild expected to be started as work pulled from
a queue [that is the expected effect for a known-to-be required rebuild,
for which Edit Rebuild Of Access Paths (EDTRBDAP) is the UI for
showing\managing the enqueued work] and a race against the failing job
that was also attempting to open the member.
The TRACE file in question only has 1 access path.
The file is quite large.
Member List
… Creation Last Change Deleted
Member Size … Date Date Time Records Records
TRACE 7329742848 … 02/05/05 03/23/16 08:33:24 5246938 18248817
Hmm. A physical file [member] rather than a logical file. The
Access Paths of a physical file are saved [unless invalid] and restored
irrespective the Save Access Paths (ACCPTH) specification on the save,
so that the access path is apparently being rebuilt seems problematic;
seems the saved file must have an invalid access path. As an LF Member
(LFM), the Change Logical File (CHGLF) has a Force Rebuild Of Access
Path (FRCRBDAP) parameter that could assist, but for a PF Member the
options are to ALTER [or Change Physical File (CHGPF) with Source File
(SRCFILE) other than *NONE] or Reorganize Physical File Member (RGZPFM)
using Allow Cancel (ALWCANCEL) of *NO and Rebuild Access Paths
(RBDACCPTH) of *YES [i.e. offline reorg].
If there is no CPF3145, perhaps the oddity\rarity of a physical
access path is something that went unnoticed, as not transpiring; is
perhaps, only functional for logical access paths.?
Hmm. 3X as many deleted, as there are active records. Having been
created in 2005, if the file is established for Reuse Of Deleted Records
per REUSEDLT(*YES), then the Deleted Entry BitMap may be broken [an old
defect whereby recovery is either toggle the REUSEDLT attribute setting
or issue an offline Reorganize Physical File Member (RGZPFM)] with that
many deleted records; though that should have no impact on the AccPth,
given deleted records are not tracked.
Subsystem class - Default wait time in seconds: 30
Access path maintenance . . .: MAINT *IMMED
Access path recovery . . . .: RECOVER *NO
Maximum file wait time . . .: WAITFILE *IMMED
Maximum record wait time . .: WAITRCD *NOMAX
Missing but possibly pertinent values are:
• Access Path Valid [also, for the file that was saved; see first of
two aforementioned "Hmm."]
• Access Path Size (ACCPTHSIZ) (*MAX1TB might have been required to
enable a valid access path being maintained against the larger
[pre-delete] amount of data [to accommodate all key values] for a given
Maximum Key Length)
I have a few unanswered questions.
1) What changed, we've never seen this MCH5802 error following a
RSTLIBBRM?
• The conflicting job is new or faster\different-timing than in the
past, many more invalid\unsaved access paths are on the media delaying
implicit\async rebuild, or there was not previously two concurrent jobs
started both doing the keyed open [so soon] after the restore.
• The saved file no longer has a valid access path and thus must be
rebuilt upon restore? The file being saved would need to have that
remedied so as to allow the save to include the access path; see RGZPFM
mentioned previously.
• Note(a) as FYI only: If the issue had been with an LF vs a PF, then
given the restore is not a scratch restore of the member data [such that
instead, is overlay\restore-over existing data] then perhaps a mismatch
was noted between the level identifiers of the saved AccPth and the
existing AccPth [for which a scratch restore should be performed to
prevent the condition]; or, irrespective scratch\restore-over, if the
LFM was not restored in the same request as the PFM.
• Note(b) as FYI only: If the issue had been with an LF vs a PF, then
the save may no longer be saving access paths by choice or by default;
i.e. the Save Access Paths (ACCPTH) parameter specification? All PF
should have all valid access paths saved, both those of each PFM and
each LFM [for why a LFM must be saved with the PFM(s) in the same save
request and why a LFM must be restored with the PFM(s) in the same
restore request to enable the logical keyed access path to restore
rather than having to be rebuilt]
2) Why was the first job successful and 2nd job failed?
A side effect of a race to open with a required AccPth rebuild is
that the loser may experience the timeout in attempting to allocate the
object(s) required to effect the rebuild or effect the open, and thus
fail to open the member. The length of time to wait for the LOCK
instruction could be set to ~60 rather than the Default Wait (DFTWAIT),
and despite being an Open request, the specified length of time for
timeout may be either in-addition-to or otherwise distinct-from the
Maximum File Wait Time (WAITFILE).
3) If TRACE01 access path was rebuilt, why did the system not log a
CPF3145?
Presumably that is file TRACE; as in the file named in the error.?
That is, presumably the correct name was being researched when looking
for a corresponding CPF3145.?
That may be normal for a user-initiated open; i.e. for an access path
rebuild as the effect of Open Database File (OPNDBF) or via the
equivalent Keyed Database Member Open method by another means, the
CPF3145 [is not issued to the joblog and] may only be issued to the
history log [thus not visible in QSYSOPR, if that is where the messages
were being viewed]. Be sure to review the Display Log (DSPLOG) for QHST
to locate any CPF31## messages, instead of using only the Display
Message (DSPMSG) for *SYSOPR.
I read many threads related to MCH5802, discuss increasing the 30
second class default wait time.
The msg MCH5802 is generic; any condition is specific to the code
that is requesting the lock, and what processing that code is doing.
The Class (CLS) object may not be germane for any particular instance.
The Default Wait Time (DFTWAIT) is used only on LOCK instructions
that do not specify an explicit TimeOut value. When the Open processing
attempts to perform the open, the determination that the AccPth needs
rebuilding likely results in an attempt to obtain an exclusive lock
before preceding; that time is likely not considered for open-timeout
and thus likely given a hard-coded timeout irrespective the DftWait
timeout. If that lock can be obtained, the rebuild can be attempted
before the real work for Open proceeds. Had there been no AccPth
rebuild requirement, the effective Create Duplicate Object (CRTDUPOBJ)
of the member object used to create the temporary Open Data Path (ODP)
probably would have been performed, and that is the phase of processing
for which the WaitFile timeout value has real meaning.
MCH5802 Escape 20 03/23/16 02:30:59.863601
< ckMiObject 001368 QDBOPEN QSYS *STMT
From Program : RmslLockMiObject
To module . .: QDBOPEN
To procedure : QDBOPEN
Statement . .: 19483
Message: Lock operation for object TRACE TRACE not satisfied.
Cause . . : The lock operation was not satisfied for object
TRACE TRACE in the specified time interval of 63 seconds.
The lock holder type is 1.
The lock holder name is CURBSTONE MISSLICE 811697.
The lock holder thread identifier is X'000000000000007F'. The
lock holder type has the following meaning: [...]
1 - The lock holder is a job. [...]
CPF4126 Escape 50 03/23/16 02:30:59.864023
QDBSIGEX QSYS 05A4 AICURB IC08XXCHGP *STMT
To module . .: AICURB
To procedure : AICURB
Statement . .: 1000001
Message: Object not available for file TRACE.
Cause . . : Cannot allocate object for file TRACE in library
MSPENWORK because object not available. The object is specified
in reason code 1 as follows: 1 -- Access path for Member TRACE
<<SNIP>>
RNX1216 Escape 99 03/23/16 02:31:00.712231
QRNXIE QSYS *STMT AICURB IC08XXCHGP *STMT
From module .: QRNXMSG
From procedure: SignalException
Statement . .: 26
To module . .: AICURB
To procedure : AICURB
Statement . .: 1000001
Message: Error message CPF4126 appeared during OPEN for file TRACE.
Cause . . : RPG procedure AICURB in program IC08XXCHGP/AICURB
received the message CPF4126 while performing an implicit OPEN
operation on file TRACE. The actual file is TRACE. <<SNIP>>
RNQ1216 Sender copy 99 03/23/16 02:31:01.045936
QRNXIE QSYS *STMT QRNXIE QSYS *STMT
From module .: QRNXMSG
From procedure: InqMsg
Statement . .: 15
To module . .: QRNXMSG
To procedure : InqMsg
Statement . .: 15
Message: Error message CPF4126 appeared during OPEN for
file TRACE (C S D F).
Cause . . : RPG procedure AICURB in program IC08XXCHGP/AICURB
received
If the timeout [seen in the failure] is additive to the WAITFILE for
the complete Open processing, then increasing that value [shown as
*IMMED, meaning no-wait-timeout] on the saved file [or on restored-to
file, if using restore-over vs scratch-restore of the file] could assist
to prevent the timeout.? I expect more likely, the condition(s) for
which the access path must be rebuilt are best resolved; e.g. if not
saving access paths, then changing to do so [along with any other
required corrections such as ensuring that the AccPth of the saved file
is valid] would be the ideal resolution. Ensuring the AccPths are
saved\restored prevents need for any rebuilds and thus should avoid any
potential conflicts for attempts to rebuild; excepting conditions
whereby the access path is marked invalid after restore due either to
defect or defect-corrective-action, but that seems highly improbable as
possible origin in the described scenario. But as noted earlier, being
a PF, an issue for a PFM AccPth being rebuilt is apparently due to the
AccPth being invalid on the saved-from member [and thus the AccPth was
omitted from the media]; ensuring the PFM AccPth gets saved\restored,
requires correcting the condition of the invalid access path whence the
file\member is being saved; if the AccPth of the saved file claims to be
valid, then something else\weirder is going on [like alluded
"conditions", just above].
As an Amazon Associate we earn from qualifying purchases.