Journaling and contention -- MIDRANGE-L

A Tale of Two Cities:

We're having difficulty with object contention on very active journals and
receivers.  We have an RPGLE history program writing hundreds of thousands
of records to a history file under commitment control.  The journals are set
up strictly for commitment control.  The file is journaled to a journal with
journal receivers managed by and deleted by the system.  Originally a single
job stream generated all the history.  We recently split the job out into
multiple concurrent streams, each writing records to the file at the same
time.

All this bulk writing to the database burns up journal receivers at an
incredible rate.  Every 1 - 2 minutes the system adds a new receiver.
Receivers are deleted at about the same average rate, but they usually are
deleted 5 or 6 at a time as a COMMIT occurs (every 10,000 records, per job).
At least once during the process we'll get an error.  It starts with:

CPI70E5  Journal or journal receiver not available.

One of the QDBSRVxx jobs gets this message.  We get these all the time.  The
second level text says that due to object locks the system couldn't figure
out whether the journal has system maintained receivers, and that it will
try again in 10 minutes.  The problem is that the journaled bulk writes fill
up the receiver and the next receiver has not be created by the system.  So
we get:

CPA7090  Entry not journaled to journal IPBLK in MRCARC. (C R)

One or more of the RPGLE job streams gets the message because it's trying to
write records and it can't journal the write.  Usually within 10 minutes of
the CPI70E5 we can retry the CPA7090 and the stream continues, because the
database server was eventually able to add a new receiver.

We've tried enlarging the receivers so that they aren't created and deleted
as frequently, in order to reduce object contention, but it hasn't helped.
Frankly I think that the CPI70E5 has got to be some sort of legacy fix from
IBM.  It seems a bit half-assed for system managed journals to say "I can't
figure out if I'm supposed to add a receiver, so I'll check back in 10
minutes".  Since the database server issue is not synchronous with the batch
program error it's difficult to make our code anticipate or react to the
situation.

I'm wondering if anyone has had similar problems with journaling on very
busy databases.  Before I try reducing (or adding) the number of records to
a COMMIT I'd like to see if anyone has a strong feel for whether it would
help.  Also, our journals are not in their own ASP.  Splitting them out is a
big job.  I've heard conflicting stories as to the benefits.  The strategy
is supposed to reduce contention between the data and the
journals/receivers.  It seems to me that my problem is contention among the
journals and receivers themselves.

Any suggestions or anecdotes?  Much thanks in advance for your help.

-Jim

James Damato
Manager - Technical Administration
Dollar General Corporation
<mailto:jdamato@dollargeneral.com>


+---
| This is the Midrange System Mailing List!
| To submit a new message, send your mail to MIDRANGE-L@midrange.com.
| To subscribe to this list send email to MIDRANGE-L-SUB@midrange.com.
| To unsubscribe from this list send email to MIDRANGE-L-UNSUB@midrange.com.
| Questions should be directed to the list owner/operator: david@midrange.com
+---