With assist of a comm trace and trcint, IBM development has identified and fixed this issue.
For 20 years, development would intermittently see this issue, but never had traces to pinpoint it.
I've been running a SAVRSTOBJ test job every 15 min since 2/18.
No errors since applying the test PTF on 3/11.
Good job IBM.
This was also a good experience for myself.
Using AJS, I was easily able to automate both the test SAVRSTOBJ job and both the comm traces.
A one minute trace was 1/2 million pages, 50 million pages / per day.
Two test PTFs as of now, V7R1.
MF61702 - APAR MA45452
MF61265 - APAR MA45188
DESCRIPTION OF PROBLEM FIXED FOR APAR 'MA45452' :
-------------------------------------------------
SNA EE (Enterprise Extender) sessions or applications such
as SAVRSTLIB randomly hang for several seconds to several
minutes or more.
CORRECTION FOR APAR 'MA45452' :
-------------------------------
The problem was traced to an intermittent failure of
a function to return time remaining of a LIC time-out
request. The failure is the result of a race between
the time remaining function and the program responsible
for merging new requests into to the time-out request
queue. The race condition has been eliminated by
ensuring adequate synchronization between the two
programs.
Paul
-----Original Message-----
From: Steinmetz, Paul
Sent: Monday, February 08, 2016 4:17 PM
To: 'Midrange Systems Technical Discussion'
Subject: EE connections including STRPASTHR, SNADS, SAVRSTLIB and DDM will hang for 20 seconds to 4 minutes
Anyone having intermittent issues with EE connections including STRPASTHR, SNADS, SAVRSTLIB and DDM will hang for 20 seconds to 4 minutes.
Sessions may also timeout and disconnect.
CORRECTION FOR APAR MA45188 :
-----------------------------
Several conditions can hang the RTP connection when the burst
timer is not restarted. One condition is where a data or burst
window is filled at the same time the burst timer fails to pop.
Another condition is burst timer can be checked and restarted if
needed when the short request timer pops. Also detection for a
missing burst timer window has been reduced from two seconds to
10 milisecs.
DESCRIPTION OF PROBLEM FIXED FOR APAR MA45260 :
-----------------------------------------------
Intermittently, the EE connection can hang for a few minutes and
then be dropped suddenly. This is happening about three times a
week. SNADS sender job got CPF5107, error code E015, and SNA
sense code 8002. E015 means UNBIND received, and sense code 8002
means data link failure. To recover, the customer must vary both
CTLD and DEVD OFF and ON.
CORRECTION FOR APAR MA45260 :
-----------------------------
The RTP alive timer on the target side system of the user's RTP
data connection on port 12003 timed out due to inactivity over
the previous three minute interval. The target side system then
initiated a new SYNC echo pending status exchange retry sequence
which also timed out due to the network dropping all packets
sent from target system on port 12003. This initiated a failed
HPR path switch resulting in a data link failure. Keep-alive
TEST packets on port 12000 do continue normal bidirectional
flow. A code change was made to add more status exchange
requests sent from the target side to detect network recovery
sooner, thus preventing a retry timeout and data link failure
notification for this scenario.
I've applied MF61265, still having issues.
Open PMR since July 2015.
Problem is very intermittent, sometimes only once every two months.
2/6/16 failure started with a STRRMTCMD that just hung, over 24 hours, had to cancel.
STRPASTRH then also failed, had to vary off/on EE controllers.
SAVRSTOBJ resulted in damaged objects.
Per IBM support request, I'm setting up an hourly test job, with traces, hopefully to recreate the error quicker.
TRCINT SET(*ON) TRCTBL(TRCEE) SIZE(700 *MB) TRCFULL(*STOPTRC) TRCTY STRCMNTRC CFGOBJ(ETHVIRT00) CFGTYPE(*LIN) MAXSTG(32M) USRDTA(*MAX)
Paul
-----Original Message-----
From: Steinmetz, Paul
Sent: Tuesday, August 04, 2015 4:54 PM
To: 'Midrange Systems Technical Discussion'
Subject: RE: savrstobj and strpasthr both hanging for 2 to 4 minutes
Jim,
1) Utilizing AJS, I have two jobs on each LPAR.
I have both traces starting 1 minutes before the SAVRSTOBJ runs.
DLTCMNTRC CFGOBJ(ETHVIRT00) CFGTYPE(*LIN)
TRCINT SET(*END) TRCTBL('TRCEE')
TRCINT SET(*ON) TRCTBL(TRCEE) SIZE(700 *MB) TRCFULL(*STOPTRC) TRCTY...
STRCMNTRC CFGOBJ(ETHVIRT00) CFGTYPE(*LIN) MAXSTG(32M) USRDTA(*MAX) ...
I have both traces stopping 6 minutes later, which is slightly longer than the SAVRSTOBJ hangs.
TRCINT SET(*OFF) TRCTBL('TRCEE') OUTPUT(*PRINT)
ENDCMNTRC CFGOBJ(ETHVIRT00) CFGTYPE(*LIN)
PRTCMNTRC CFGOBJ(ETHVIRT00) CFGTYPE(*LIN) CODE(*ASCII) FMTTCP(*YES)...
The TRCINT is 420,000 pages for the 6 minutes.
The PRTCMNTRC is 23,000 pages for the same window.
2) I ran a query over the AJS job history file (QAIJSHST), selecting SAVRSTOBJ job when the run time was greater than 3 minutes, indicating probably a failure/hang.
A normal run is 2 minutes or less.
Below are the results.
This reveals the SAVRSTOBJ hang is occurring more often than I thought, not always reported.
Management requested that I move the SAVRSTOBJ job outside the 8 to 5 window, not to impact production.
Job now runs at 6:25.
08/04/15 10:48:24 AJS OBJCLNL SAVRSTOBJ failures PAGE 1
Job Name Last Run Last Start Last End Elapsed
Date Time Time Time
OBJCLNL 14/03/05 8:00 8:05 6
OBJCLNL 14/03/12 6:30 6:35 6
OBJCLNL 14/03/28 6:30 6:36 6
OBJCLNL 14/03/30 6:30 6:36 6
OBJCLNL 14/04/05 6:30 6:36 7
OBJCLNL 14/05/19 6:30 8:19 109
OBJCLNL 14/10/03 8:30 8:37 8
OBJCLNL 14/11/21 8:30 8:34 5
OBJCLNL 15/01/22 8:30 8:35 6
OBJCLNL 15/02/02 8:30 8:37 8
OBJCLNL 15/02/13 8:30 8:38 8
OBJCLNL 15/02/17 8:30 8:35 6
OBJCLNL 15/02/28 8:30 8:35 6
OBJCLNL 15/03/10 8:30 8:34 4
OBJCLNL 15/03/15 8:30 8:34 5
OBJCLNL 15/07/21 8:30 8:35 6
OBJCLNL 15/07/29 8:30 8:37 7
OBJCLNL 15/08/03 8:30 8:35 5
* * * E N D O F R E P O R T * * *
Paul
-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Jim Oberholtzer
Sent: Tuesday, August 04, 2015 8:01 AM
To: 'Midrange Systems Technical Discussion'
Subject: RE: savrstobj and strpasthr both hanging for 2 to 4 minutes
Paul:
The only thing I'm seeing is an occasional B6007101 which is an APPN session failure, and I know why those happened. A comm line went down.
I think a comm trace and PMR are in your near future unfortunately.
--
Jim Oberholtzer
Chief Technical Architect
Agile Technology Architects
-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Steinmetz, Paul
Sent: Monday, August 03, 2015 7:55 AM
To: 'Midrange Systems Technical Discussion'
Subject: RE: savrstobj and strpasthr both hanging for 2 to 4 minutes
Jim,
Did you ever get to check your LIC LOG entries.
We had a reoccurrence this morning.
The same job containing a SAVRSTOBJ job triggered the issue at 8:31.
Same LIC log entry
08002376 Source/Sink information 0701 0C00 08/03/15 08:31:11 6
At least two users that were already passed through on LPAR B - Pencor06 , were hung.
Once the SAVRSTOBJ completed, they were fine.
The SAVRSTOBJ normally takes 50 seconds.
When the STRPASTHR jobs notice the hanging, the SAVRSTOBJ takes about 4 minutes.
Paul
-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Jim Oberholtzer
Sent: Friday, July 24, 2015 2:41 PM
To: 'Midrange Systems Technical Discussion'
Subject: RE: savrstobj and strpasthr both hanging for 2 to 4 minutes
I will but it will be later since I'm not in the office until Wednesday.
--
Jim Oberholtzer
Chief Technical Architect
Agile Technology Architects
-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Steinmetz, Paul
Sent: Friday, July 24, 2015 10:38 AM
To: 'Midrange Systems Technical Discussion'
Subject: RE: savrstobj and strpasthr both hanging for 2 to 4 minutes
Jim,
Please check if you have this same error in your lic log at the time of the failure.
08000F46 Source/Sink information 0701 0C00 07/21/15 08:32:17 6
Paul
-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Steinmetz, Paul
Sent: Friday, July 24, 2015 10:38 AM
To: 'Midrange Systems Technical Discussion'
Subject: RE: savrstobj and strpasthr both hanging for 2 to 4 minutes
Jim,
On both your V5R4 and V7R1 systems?
1) Do you have EE configured?
2) Do you use SAVRST*** cmds?
3) Do you use STRPASTHR?
I also only see this once every several months.
We SAVRST*** both directions, but the issue only occurs A to B.
STRPASTHR is only A to B.
Paul
-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Jim Oberholtzer
Sent: Wednesday, July 22, 2015 4:53 PM
To: 'Midrange Systems Technical Discussion'
Subject: RE: savrstobj and strpasthr both hanging for 2 to 4 minutes
On that system about every 90 days or so.
On my V7Rx systems I've not had a problem.
I like the thought of the AJS to set the comm trace.
--
Jim Oberholtzer
Chief Technical Architect
Agile Technology Architects
-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Steinmetz, Paul
Sent: Wednesday, July 22, 2015 3:48 PM
To: 'Midrange Systems Technical Discussion'
Subject: RE: savrstobj and strpasthr both hanging for 2 to 4 minutes
Jim,
How often do you see this?
I was not successful in gathering comm traces, they need to be running prior to the error.
I'm considering setting up AJS scheduled jobs to start and end the comm traces, during the same time we run the SAVRST*** jobs.
Paul
-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Jim Oberholtzer
Sent: Wednesday, July 22, 2015 4:36 PM
To: 'Midrange Systems Technical Discussion'
Subject: RE: savrstobj and strpasthr both hanging for 2 to 4 minutes
Paul,
I've seen this behavior but it's on a V5R4 system so I'm not sure it's relevant. When that happens we force the vary off of the enterprise controller and the controller named for the target. Same thing on the target side. Then restart the target then the local system. There must be a TCP timeout somewhere because we have to leave the controllers varied off for at least 3 minutes.
My own guess, and its' purely a guess is the HPR transport code has some
sort of problem. IBM's gonna want a comm trace to diagnose it.
--
Jim Oberholtzer
Chief Technical Architect
Agile Technology Architects
-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Steinmetz, Paul
Sent: Wednesday, July 22, 2015 3:16 PM
To: 'Midrange Systems Technical Discussion'
Subject: savrstobj and strpasthr both hanging for 2 to 4 minutes
I had a repeat of an old issue, savrstobj and strpasthr both hanging for 2 to 4 minutes.
Had multiple previous posts on similar issue.
We use both SAVRSTOBJ and SAVRSTLIB to move objects between LPARS.
We also use STRPASTHR to connect from LPAR to LPAR.
Previously, Enterprise Extender controllers needed resets, not this time.
This time, no error messages, no resets, just a hang.
Could this be a lock, seize, other issue?
Is there any reason why multiple processes (Strpasthr/Savrrstob) cannot be started/initiated concurrently?
Any thoughts?
Thank You
_____
Paul Steinmetz
IBM i Systems Administrator
Pencor Services, Inc.
462 Delaware Ave
Palmerton Pa 18071
610-826-9117 work
610-826-9188 fax
610-349-0913 cell
610-377-6012 home
psteinmetz@xxxxxxxxxx
http://www.pencor.com/
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at
http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at
http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at
http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at
http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at
http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at
http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at
http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at
http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at
http://archive.midrange.com/midrange-l.
As an Amazon Associate we earn from qualifying purchases.