Hi All,
I'm not completely sure if this belongs on the RPG list or the general
midrange one, but my best guess is this list, since it relates to embedded
SQL.
I've been seeing a bunch of SQL return codes -901 - SQL system error lately.
They are coming out of a couple of database trigger programs.
The trigger programs are both doing the same simple thing - looking at
before and after images and building a JSON Message to insert into a CLOB.
There is nothing wrong with the SQL that is built. It should probably be
built using VALUES rather than building against SysIbm/SysDummy1, but that
doesn't mean it is invalid.
A typical SQL string might look like this:
Select JSON_OBJECT( 'msgType' value 'orderLineFeedback', 'orderNumber'
value '01'241575' ,
'lineNumber' value 3 , 'status' value '2' , 'warehouse' value 'KA'
, 'location' value 'R040' )
From SysIbm/SysDummy1'
This is fed into an insert statement and these triggers run thousands of
times a day, mostly without issue.
Looking at the dumps that are giving the occasional -901 error, they are
really coming from two main application programs.
Here is a typical stack dump:
PGM0001 LIBRARY001 0000000577 PGM0001 PGM0001
*NEW 0000000000000012 N
PGM002 LIBRARY001 PGM002 _QRNP_PEP_PGM002
*NEW 0000000000000012 N
PGM002 LIBRARY001 0000004034 PGM002 PGM002
*NEW 0000000000000012 N
QRNXIO QSYS 0000000049 QRNXDBIO _QRNX_DB_UPDATE
*NEW 0000000000000012 N
QDBUDR QSYS 0000009721 QDBUDR QDBUDR
*DFTACTGRP 0000000000000001 Y
QDBUDR QSYS 0000019759 QDBUDR CALL_TRIGGER_PGM
*DFTACTGRP 0000000000000001 N
TRGCTL TRGLB TRGCTL _QRNP_PEP_TRGCTL
*NEW 0000000000000012 Y
TRGCTL TRGLB 0000000820 TRGCTL TRGCTL
*NEW 0000000000000012 N
PGM003 LIBRARY02 PGM003 _QRNP_PEP_PGM003
*NEW 0000000000000012 N
PGM003 LIBRARY02 0000028700 PGM003 PGM003
*NEW 0000000000000012 N
QSQROUTE QSYS 0000020176 QSQROUTE QSQROUTE
*DFTACTGRP 0000000000000001 Y
QSQRUN3 QSYS 0000015359 QSQOPEN SQL_OPEN
*DFTACTGRP 0000000000000001 N
QSQRUN3 QSYS 0000021563 QSQOPEN OPQRYDBM
*DFTACTGRP 0000000000000001 N
QSQRUN3 QSYS 0000041264 QSQOPEN FULL_OPEN
*DFTACTGRP 0000000000000001 N
QSQRUN3 QSYS 0000031550 QSQOPEN MAINERR
*DFTACTGRP 0000000000000001 N
QSQRUN3 QSYS 0000033004 QSQOPEN DUMPQDT
*DFTACTGRP 0000000000000001 N
QSQSBAS2 QSYS 0000003090 QSQSVCDMP SQSVCDMP
*DFTACTGRP 0000000000000001 N
I've changed the names to protect the innocent, but in this example PGM002
is the application program.
It's not shown here, but the caller of PGM0001 runs in *NEW, and PGM0001,
PGM002, TRGCTL and PGM003 are all running in *CALLER,
so running in the NEW activation group set up by the (not shown) program
which in this case is activation group 12.
TRGCTL is a generic trigger shell and PGM003 is the real trigger trying to
build the JSON message and getting the SQL system error.
Commit control is not being used.
PGM003 sets on LR and returns after every call.
Although it is PGM003 that is getting the bad SQL return code, I'm
suspicious of the caller - PGM002.
My reasons are:
PGM003 runs many times a day without error, and is pretty simple - build a
string from journal images and insert it into a table.
Almost every time PGM003 has a failure, PGM002 is the program calling it.
PGM002 is coded very badly, with quite a few SQL select intos and updates
and not a single check of SQLCode.
PGM002 I believe is running from an RF device - although nothing in the
joblog shows me any kind of a disconnection error.
In PGM003 it is the Exec SQL open cursor statement that is getting the -901
SQLCode.
So, my question is - is it likely or even conceivable that some fatal SQL
error is occurring unmonitored in PGM002, which is having a knock on effect
in PGM003?
I can't make too much sense of the info logged to /TMP/SQE/ERRORS/
And the joblog isn't very instructive either - Sev 60 MCH3203:
21/10/19 06:18:21.916818 DbpmTrap 000958 QQQOOODBOP
QSYS
To module . . . . . . . . . : QQQOOOINV
To procedure . . . . . . . : CALLDBMAINTFOROPENOROPTIMIZE
Statement . . . . . . . . . : 5209
Message . . . . : Function error X'1720' in machine instruction. Internal
dump identifier (ID) 01010E22.
Cause . . . . . : The machine instruction that was running failed. Time
stamp is 21/10/19 06:18:21, error code is X'1720', error class is 0,
device
number is X'0000'. The error class indicates how the damage was detected:
0000-unspecified abnormal condition; 0002-logically invalid device
sector;
0003-device failure; 0004-invalid operation performed. For error class
0003,
the device number identifies the failing device, or contains zero if main
storage failed. For error class 0004, an unsupported machine interface
instruction operation code was used. Recovery . . . : For error class
0004, remove the unsupported machine interface instruction operation code
from the program. For all other error classes, start problem analysis
(ANZPRB command).
Sorry this has been rather a long one, but if anyone has seen something
like this before or has an advice on how else to diagnose the error.
thanks kindly,
Craig
As an Amazon Associate we earn from qualifying purchases.