At the time I had the problem we were using jtopen 5.0 (elderly, but reliable) and after digging into the source code I found that the code creating the actual socket was not designed for programmers to reach.

I also found that the DataQueue.read() timeout functionality is server side so if the TCP/IP connection is silently torn down (which I believe is the underlying cause of this) it will still hang.

We decided on the black-box approach where we do an API call with a timeout value and detect if it doesn't return. The detection require a separate thread, which is most easily done with the Executor framework in Java 5+.

/Thorbjørn



Den 30/12/11 18.27, Blalock, Bill skrev:
Would lowering the timeout value to a few minutes be an option?
Yes, setting the timeout to 2 minutes or less works. Longer values can allow the problem but not consistently.

I agree with "network devices in between" as the cause. Our IT guys could not help. Security and the network hardware is managed off site.

I wrote a test program to try to isolate the problem.

I found that SocketProperties.setKeepAlive(true) effected the problem in my test program. The AS400 uses the default keep alive value. I have set the keep alive beeper on my laptop to a shorter value.

Have you seen SocketPrperties.setSoTimeout(milliseconds) used to detect the connection being lost? I am trying to get the DataQueue.read()& timing out to throw an exception that I can handle earlier.

Thanks for the reply. Happy new year. I hope 2012 will be a good one.

-----Original Message-----
From: java400-l-bounces@xxxxxxxxxxxx [mailto:java400-l-bounces@xxxxxxxxxxxx] On Behalf Of Thorbjoern Ravn Andersen
Sent: Friday, December 30, 2011 5:31 AM
To: Java Programming on and around the IBM i
Subject: Re: Strange timeout like behavior with JTOpen DataQueue read()

I have seen similar things with idle dataqueue most likely because of
network devices in between.

Would lowering the timeout value to a few minutes be an option?

We have also wrapped the call in a Callable which can be timed out by an
Executor, so we can detect the timeouts. You need to
disconnectAllServices on the current AS400 object and create a new, to
ensure correct operation after a timeout has been detected.



Den 30/11/11 03.10, Blalock, Bill skrev:
I am trying to run down recent problems with Java applications. These
programs have been running trouble free for a long time, this problem has
appeared in the last few weeks. I hope someone has seen something like
this before or has some ideas of where to look for the cause.

These are client / server Java applications which exchange data with Data
Queues. The AS400 side is usually an RPG program. The Windows side is a
Java6 program using JTOpen on a desktop PC or a Windows server. In some
applications the AS400 is the client and in other the AS400 is the server.

This pseudo code shows the logic where the problem is:

DataQueue dataQueueObject; // same problem with KeyedDataQueue
boolean runPgm;
int waitTime = 60*60;

while( runPgm ) {
try {
DataQueueEntry entry = dataQueueObject.read(waitTime);
if ( entry != null ) {
// do something with the entry
// entry may carry an end flag
} else {
// maybe look for semaphore or whatever which would
// signal to end program
}
} catch ( xxxxxException e ) {
} catch ( yyyyyException e ) {
} catch ( zzzzzException e ) {
} catch ( Exception e ) {
} finally {
System.out.println("finally");
}
} // while( runPgm )

waitTime has a largish value so every once in a while the program will re-validate
the connection to the AS400. In the past I've had problems with read(-1) ... like the
AS400 was IPL'd and the partner Windows application didn't notice.

The problem:

When the time since the last data queue entry is longer than SOMETHING and less than
waitTime, and an entry is written into the data queue

- the DTAQ entry written to the data queue is removed from the queue as if it

it was read.

- the Java program is still executing dataQueueObject.read(), waiting waitTime.

No exception was thrown and finally wasn't executed.

- additional entries written to the data queue will remain in the data queue

SOMETHING seems to vary with the application. In one job I was troubleshooting (Java6,
Windows server) SOMETHING appeared to be around 45 minutes and in another it was around 20 minutes.

I ran some code interactively on a laptop (Java6, Windows7, JTOpen 7.5.1) to get the
error and see what was going on. SOMETHING was 3 to 5 minutes.

When the data queue is inactive long enough to cause the problem
DataQueue.write(...)
throws an IOException. getMessage() is "connection reset" that appears to be caused by
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read()
at com.ibm.as400.access.DataStream.readFromStream()
at com.ibm.as400.access.ClientAccessDataStream.construct()
at com.ibm.ass400.access.AS400ThreadedServer.run()

There isn't anything in the QZHQSSRV job log until the Java program terminates, then:

Message ID . . . . . . : CPE3426 Severity . . . . . . . : 10
Message type . . . . . : Diagnostic
Date sent . . . . . . : 11/29/11 Time sent . . . . . . : 18:55:09

Message . . . . : A connection with a remote socket was reset by that
socket.

Message ID . . . . . . : CPIAD08 Severity . . . . . . . : 40
Message type . . . . . : Diagnostic
Date sent . . . . . . : 11/29/11 Time sent . . . . . . : 18:55:09

Message . . . . : Host server communications error occurred on recv() -
length.
Cause . . . . . : Error code 3426 was received while processing the recv() -
length function for the host server communications.
Recovery . . . : See any previously listed message(s) to determine the
cause of the error; if necessary, correct the error and issue the request
again.


It seems like the connection to the AS400 is timimg out, or being timed out, when it
is inactive.

The problem occurs when the time between data queue entries is longer than something,
say 20 minutes for discussion, but longer than waitTime.

In my interactive test it seems that the problem is eliminated by setting waitTime
to a value than the default wait time of the QZHQSSRV job.
Up until 3 weeks ago everything was work working normally. Then long running client
server jobs begin to shut down in unexpected ways. Some of these programs have been
in place for years.

I think something has been changed in the network or on the AS400 but I don't know
what.

This problems seems to be happening with all my AS400<-data queue-> Java on Windows
that I used the design pattern in the pseudo code when the timing is right (or wrong).

Does this sound familiar to anyone?

Thanks everyone for your time.

Bill Blalock


_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.




As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2022 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.