RE: Blocking I/O (was: Expensive op codes) -- RPG400-L

I am combining this answer with an answer to Peter Dow's earlier message
about READE being more efficient. It is not, in fact, it is extremely
costly. See below. Thanks.

As an aside here, do anybody understand the difference between "SEQ ONLY"
and "NBRRCDS" on the OVRDBF command? I have got conflicting statement from
IBM about this and the manuals don't seem clear. I thought I understood the
SEQONLY and NBRRCDS but then got told it was the opposite of what I thought
so not sure what to believe. Thanks.
 
>> When you open a record for update on a blocked
>> file, wouldn't it tend to lock the whole block instead of just that one
record?

No, when you open a file for update, all blocking is turned off. The AS/400
data base does a record by record read. That is why updates can be so
expensive. Instead of being able to read in dozens or hundreds of records,
the AS/400 must read a single record, put a lock on, return the data to you,
receive it back, update the data base and remove the lock for each record. 

Blocking can be extremely beneficial. The AS/400 does blocking automatically
in all situations that it can. The automatic blocking occurs in the
following way as far as I know.

1. Your program starts and the runtime creates an I/O buffer equal the size
of the largest record in your program. The I/O buffer is what moves one
record from the data base manager to your program. I will talk below about
how this can really affect program performance.

2. The runtime takes a magic number(The manuals says 4K). The magic number
represents what the IBM system engineers have determined is the best default
size for the data management buffer(What the Data Programming books calls
the job's internal data main storage area) to hold all these records. They
don't want to size it too large or you will start swapping the buffer out to
disk but you don't want to make it to small because you won't get enough
records in the buffer. You then end up with the number of records that can
be put into the data management buffer.  

3. The runtime makes a read request and AS/400 reads a block of records from
the data base into the machine buffer and then selects the records out one
by one and moves them into a data management buffer, let's say you have
space for 25, until it reaches it's fill or it runs out of records to
process. As I understand it, IBM does the selection of records at the raw
machine buffer level so the only thing that gets loaded into the buffer are
records that have been selected. Don't know if that has changed but seems to
me that would be true because all of this is occurring at the data base
level.
 
4. Control returns to the runtime. The runtime then goes to the Data
Management Buffer and retrieves the first record and loads it into the I/O
buffer and returns to the program.

5. The runtime just keeps going until it runs out of records or, this is
important part, you issue a CHAIN or READE. Either one of these functions
causes the buffer to be flushed and reloaded again. IBM calls this Logical
I/O(Making a logical call to the data base manager) and according to IBM
engineers, this is the most expensive operation in the system taking up a
vast majority of system resources and you can see why. You CHAIN a record
and flush the buffer and it loads up a bunch of records in the buffer, then
you do a READE and flush it again and reload with another Logical I/O. That
is why you always want to do a READ with a manual compare. When you do a
READ, the AS/400 runtime, just gets the next record from the buffer and
returns it in the I/O buffer. A flush only occurs if all the records in the
buffer have been read. IBM said the only version of OS/400 that ever did the
READE by reading the I/O buffer instead of the reloading was version 1 and
that was changed the next release to reload because of problems that it
created. This was all news to me when I first heard about it at a Common
session put on by IBM but this is what they claim is true. The purpose of
the session was to encourage people to not use READE where possible and do
everything possible to reduce Logical I/O . 

This is the default behavior. If you issue an OVRDBF and give it different
number of records, it will either make that buffer bigger or smaller because
it assumes you know what you are doing. Where you can really get burned on
this occurs when you make the Data Management Buffer too big. Say you set it
to 10,000 records(I used SEQONLY 10,000). If that buffer is too big and your
program cannot process all the records out in a reasonable time(Really a few
seconds) , the AS/400 just starts to swap the buffer to disk (memory is
memory) and then when you hit that record, it has to reload it back and this
happens over and over again. I made this mistake back when I first started
on the System 38. I thought the more records in the buffer, the better.
Started the program and it was taking hours and hours and the disk drive as
going nuts. When I figured out the problem and reset it a reasonable number,
the process finished in minutes. 

The other issue that I alluded to above concerning the size of the I/O
buffer and that being based on the largest record in your program. Let's say
you read in a whole table for say the Customer Master which is 1000 bytes.
Assuming that is the largest record in your program, the AS/400 just divides
the magic number by 1000 resulting in 4 records(This is based on the magic
number being 4K) that can be read in a time. As an alternative, let's say
you define a field select logical (My term) and bring in only the customer
number and the balance that you need. Your record size is now, say 15 bytes.
Take the magic number and divide by 15. Suddenly, you have space for 264
records in the buffer. This does not work with updates but want to see a
program fly, try bringing only what you need into the program doing some
sort of sequential read (Keyed or not). This reduces the buffer size as well
as reducing the size of the PAG (IBM is recommending that we do what we can
to reduce that PAG size. The bigger it is, the slower the program runs
because the runtime has to constantly be managing the PAG to try to keep
it's size down. They are saying this is especially a problem with RPG ILE
because it is more of "C" type language and all "C" type languages are
extremely efficient at handling automatic storage (Storage allocated and
destroyed in a procedures) and extremely inefficient at handling static
storage(Global variables or file data storage, i.e. the PAG)). I was
astounded at the speed difference you could get reading a file at input
bringing in just what you needed.

I did all this testing several years ago and did a lot of my talking to IBM
engineers then so something might have changed and I suspect the algorithms
are a lot my sophisticated than I know but this gives you a flavor about how
blocking works. Extremely powerful tool and one that is pretty easy to use
if you understand how it works. 

Thanks for the opportunity to talk about this. 

-----Original Message-----
From: Jim Langston [mailto:jlangston@conexfreight.com]
Sent: Tuesday, October 19, 1999 12:46 PM
To: RPG400-L@midrange.com
Subject: Re: Blocking I/O (was: Expensive op codes)


I don't' come from the S/36 days, but from what I understand about blocking,
this can be a bad thing.

As I understand it, when you open a record for update on the AS/400 it locks
that record, and nothing more.  When you open a record for update on a
blocked
file, wouldn't it tend to lock the whole block instead of just that one
record?

Regards,

Jim Langston

Dan Bale wrote:

> I don't spend a lot of time reminiscing of my days on the S/36, but the
one
> thing I miss and can't believe that the AS/400 never took advantage of was
the
> ability of the S/36 to block records on update files.  Not only that, but
the
> S/36 also took advantage of the double buffer option on the F-spec.  I
> benchmarked all this way back when.  If I remember correctly, an updated
record
> in the buffer not yet written out to disk was still available to other
jobs;
> can't remember if the OS forced the write to disk or not.  Was this what
was
> referred to as "single level store"?
>
> Your method of using an input file and an output file (with blocking on
both)
> instead of a single update file is a design that I've used before when
faced
> with a mass update of a very large file.  It's just a shame that, for
whatever
> reason, IBM couldn't port that S/36 feature over to the AS/400.
>
> - Dan Bale
>

[Snippedy Snip Snip snip]

+---
| This is the RPG/400 Mailing List!
| To submit a new message, send your mail to RPG400-L@midrange.com.
| To subscribe to this list send email to RPG400-L-SUB@midrange.com.
| To unsubscribe from this list send email to RPG400-L-UNSUB@midrange.com.
| Questions should be directed to the list owner/operator:
david@midrange.com
+---
+---
| This is the RPG/400 Mailing List!
| To submit a new message, send your mail to RPG400-L@midrange.com.
| To subscribe to this list send email to RPG400-L-SUB@midrange.com.
| To unsubscribe from this list send email to RPG400-L-UNSUB@midrange.com.
| Questions should be directed to the list owner/operator: david@midrange.com
+---