|
Would lowering the timeout value to a few minutes be an option?
I am trying to run down recent problems with Java applications. These
programs have been running trouble free for a long time, this problem has
appeared in the last few weeks. I hope someone has seen something like
this before or has some ideas of where to look for the cause.
These are client / server Java applications which exchange data with Data
Queues. The AS400 side is usually an RPG program. The Windows side is a
Java6 program using JTOpen on a desktop PC or a Windows server. In some
applications the AS400 is the client and in other the AS400 is the server.
This pseudo code shows the logic where the problem is:
DataQueue dataQueueObject; // same problem with KeyedDataQueue
boolean runPgm;
int waitTime = 60*60;
while( runPgm ) {
try {
DataQueueEntry entry = dataQueueObject.read(waitTime);
if ( entry != null ) {
// do something with the entry
// entry may carry an end flag
} else {
// maybe look for semaphore or whatever which would
// signal to end program
}
} catch ( xxxxxException e ) {
} catch ( yyyyyException e ) {
} catch ( zzzzzException e ) {
} catch ( Exception e ) {
} finally {
System.out.println("finally");
}
} // while( runPgm )
waitTime has a largish value so every once in a while the program will re-validate
the connection to the AS400. In the past I've had problems with read(-1) ... like the
AS400 was IPL'd and the partner Windows application didn't notice.
The problem:
When the time since the last data queue entry is longer than SOMETHING and less than
waitTime, and an entry is written into the data queue
- the DTAQ entry written to the data queue is removed from the queue as if it
it was read.
- the Java program is still executing dataQueueObject.read(), waiting waitTime.
No exception was thrown and finally wasn't executed.
- additional entries written to the data queue will remain in the data queue
SOMETHING seems to vary with the application. In one job I was troubleshooting (Java6,
Windows server) SOMETHING appeared to be around 45 minutes and in another it was around 20 minutes.
I ran some code interactively on a laptop (Java6, Windows7, JTOpen 7.5.1) to get the
error and see what was going on. SOMETHING was 3 to 5 minutes.
When the data queue is inactive long enough to cause the problem
DataQueue.write(...)
throws an IOException. getMessage() is "connection reset" that appears to be caused by
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read()
at com.ibm.as400.access.DataStream.readFromStream()
at com.ibm.as400.access.ClientAccessDataStream.construct()
at com.ibm.ass400.access.AS400ThreadedServer.run()
There isn't anything in the QZHQSSRV job log until the Java program terminates, then:
Message ID . . . . . . : CPE3426 Severity . . . . . . . : 10
Message type . . . . . : Diagnostic
Date sent . . . . . . : 11/29/11 Time sent . . . . . . : 18:55:09
Message . . . . : A connection with a remote socket was reset by that
socket.
Message ID . . . . . . : CPIAD08 Severity . . . . . . . : 40
Message type . . . . . : Diagnostic
Date sent . . . . . . : 11/29/11 Time sent . . . . . . : 18:55:09
Message . . . . : Host server communications error occurred on recv() -
length.
Cause . . . . . : Error code 3426 was received while processing the recv() -
length function for the host server communications.
Recovery . . . : See any previously listed message(s) to determine the
cause of the error; if necessary, correct the error and issue the request
again.
It seems like the connection to the AS400 is timimg out, or being timed out, when it
is inactive.
The problem occurs when the time between data queue entries is longer than something,
say 20 minutes for discussion, but longer than waitTime.
In my interactive test it seems that the problem is eliminated by setting waitTime
to a value than the default wait time of the QZHQSSRV job.
Up until 3 weeks ago everything was work working normally. Then long running client
server jobs begin to shut down in unexpected ways. Some of these programs have been
in place for years.
I think something has been changed in the network or on the AS400 but I don't know
what.
This problems seems to be happening with all my AS400<-data queue-> Java on Windows
that I used the design pattern in the pseudo code when the timing is right (or wrong).
Does this sound familiar to anyone?
Thanks everyone for your time.
Bill Blalock
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.