I think I found the answer to this. Well, I hope so, anyway. The job is
running in a subsystem that only allows 10 jobs to run at a time, which
would account for the "couldn't fork" error. Found some old conversations
about similar issues. I've moved the job to the same subsystem where I
run our tomcat server, which should easily be able to accommodate 50 or
so threads. I'll find out for sure tomorrow, but it looks promising.

On Sun, 23 Sep 2012 20:39:57 +0000, Pete Hall wrote:

I have a shell script process that executes a java upload client. It is
executed from an RPGLE program via the qp2runpase API. The
QIBM_USE_DESCRIPTOR_STDIO environment variable is set to Y. The shell
script is the target of the qp2runpase API. It is executable and
contains #!/bin/sh in the first line. Of course, it works fine in the
test LPAR.

The script process breaks down like this 1. Main script is executed by
qp2runpase, and overrides stdout and stderr to files.
2. Main script starts a background process to monitor the stderr file.
a. The monitor script enters a sleep loop.
b. It checks the stderr file for messages every so often.
c. If anything bad appears in the stderr file
i. It signals the main script.
ii. It exits normally.
3. Main script starts a background process to run the java upload.
a. The upload script executes the java client in-line.
4. Main script enters a sleep loop and waits for a signal from either of
the background processes.

In production it never gets far enough to start java. I have a "verbose"
mode, which writes messages every time any of the scripts do something,
so I can tell what it did.

I see 2 messages in the stderr file.

The first is from the main script, where it attempts to load a small
library file (possibly file handle not available?). The message says
"<scriptname>[51]: cannot fork: too many processes" Line 51 in that
script contains . <library name> The library file contains two small
functions that don't do anything at all initially, but one of the
functions is used to write time-stamped messages to the stdout file, and
there are a couple of timestamped messages in it, so it looks as though
the library script is loaded. I can tell it came from the main script,
because of the message content.

The 2nd message is from the monitor script. It's the same message,
regarding the line where it loads the same library.

Here's the library script. Is there something wrong with it?
#!/bin/sh #
# Set required environment variables for execution via the Qp2RunPase
API # ---------------------------------
initEnv() {
if [ -z "$TZ" ] ; then
export TZ='<CST>6<CDT>,M3.2.0,M11.1.0'

if [ -z "$PATH" ] ; then
export PATH='/QOpenSys/usr/bin:/usr/ccs/bin:/QOpenSys/usr/

# -------------------------------------------------------------------
# Send a timestamped message to STDOUT or STDERR #
# $1 is the message text # If $2 is present, send to STDERR. Otherwise
send to STDOUT printMsg () {
TS=`date +'%F %T %z'`
if [ -n "$2" ] ; then
printf "${TS}: $1">&2
printf "${TS}: $1"

I've tried adding the ALWMLTTHD(*YES) parameter to the SBMJOB command
for the RPGLE program that runs this mess. Somewhere I saw a hint that
it may make a difference, but I'm not very hopeful, as it works in test
without that. I'll find out Monday night at zero dark 30. I also changed
the timestamp assignment to explicitly specify the location of date, but
it's located in the first directory in $PATH, and the initEnv() method
call is the first executable line in the main script.

There are supposed to be 200 file handles available by default. There's
no way it's using more than that.

Anyone have a clue why this is happening?

I'm thinking things to try are running DosSetRelMaxFH(),
pthread_setcanceltype() and pthread_setcancelstate() before executing
qp2runpase(). I'm not sure what the cancel type and state could have to
do with this, but the docs hint that it might do something beneficial.


Pete Hall

Return to Archive home page | Return to MIDRANGE.COM home page