Re: recommendations for faster IFS and string handling sub-procs. -- RPG400-L


On Fri, 13 Jun 2003, Richard B Baird wrote:
>
> 1.  what are the performance ramifications of using these string and IFS
> functions in a service program rather than, say, creating modules and
> binding them together into one program, or by simply defining all my procs
> in the driver program, eliminating the binding altogether?

Assuming your service program isn't ACTGRP(*NEW) (which nobody should ever
do for a service program!!) the differences between direct binding and
calling a service program routine should be fairly nominal.   Though, if
you think about it, it would be pretty easy to make a test copy of your
programs and pull the code for the procedures in, and just see how much
faster it is.

I really doubt it would outweigh the advantages of having them separate,
though.

> 2.  what is the best way of passing potentially large strings to and from a
> sub-proc?

By reference.   Never pass long strings by value...   Don't use long
strings as fixed-length CONST, either.   (Though, variable length CONST is
fine)

>  (see my examples for how I did it - mostly varying strings.
>

I don't like the 256a return values.   I *really* don't like the 32k
return values.   That's 32k of data that it has to make a copy of every
time your procedure returns.

Instead, pass the variable by reference.   Then, doesn't have to allocate
new memory for it, it doesn't have to make a copy of it, etc.  that should
make a difference in performance.


> 3.  I've put all my IFS open/read/close stuff in a subproc for ease of use,
> but I'm not saving as much in ease as i'm losing in performance.

I have no idea what you're losing in performance.  I doubt the open/close
has any impact at all on performance...  a few milliseconds for the total
run time, maybe.

the read procedure has a huge problem in the fact that it reads one byte
at a time from the file.   You REALLY need to read things into a larger
buffer if you want it to perform well.   This will make a HUGE difference.

But, more on that later, when I reach the part where I pick apart your
code :)

> 4. my csv parsers do 1 variable at a time - this may be where I'm wasting a
> lot of time - but i was hoping to reuse it later - It recieves the entire
> record, a delimiter, and a 'field index' and returns the nth variable in
> the record.  can anyone improve upon this design with out making it 'file
> specific'?

Hmmm... there are a LOT of ways to approach this.  However, if it's well
designed, a field-at-a-time can be a good way of doing things.

> below are some of my subprocs - cut them to pieces please - show no mercy.

Thank you, I will.

> P ReadStrmF       B                   export
> D ReadStrmF       PI         32765A   varying
> D  peFD                         10I 0
> D  peEOF                          n

Okay, here's that 32k return value that I was complaining about before.
(Hmmm... "peFD","peEOF"??  That's the same naming convention I use!!)
Instead, I'd pass the parms like this:

  D ReadStrmF       PR             1N
  D   peFD                        10I 0 value
  D   peRecord                 32765A   varying options(*varsize)
  D   peMaxLen                    10I 0 value


I made the EOF indicator the return value.   Since that's only 1 byte,
the time it takes to make a copy of it is totally insignificant.  This
also gives you another advantage, you can use it in a DOW loop or IF
statement.

For example:

 /free
    fd = my_open(etc:etc);
    dow ReadStrmf(fd: rec: %size(rec));
      // do something to record here
    enddo;
    my_close(fd);
 /end-free

For peFD I passed it by value.   That makes it when you are calling it
that the procedure will not modify the value.   It allows you to use
expressions for the parameter.   And, best of all, it only needs to pass
32-bits.   When you pass by reference, you were passint 128-bits (the size
of a pointer on the iSeries) , so passing by value should be marginally
faster.

For peRecord, I passed it by reference, and I also added options(*varsize).
Before, you did it as a return value.   Return values are passed by VALUE,
so instead of passing a 128-bit pointer, it had to actually copy the
entire 32k variable.   32767 bytes copied instead of 16.   That'll make a
difference.

Furthermore, because it's options(*varsize) you don't always have to
declare your caller's variable as 32k long.   If you wanted it to be
only 100 bytes for some records, it would still work.

That variable size is also why I added the peMaxLen parameter.  Since
we're allowing various sizes to be passed, we can use peMaxLen to make
sure we don't overflow the ACTUAL size of the variable.

> C                   dou       Char = x'0A' or Char = x'0a'

Hmm... this is probably a bug, but...  why are you checking it for x'0a'
twice in the same statement?!    This makes no sense.   They're the same
thing.

> C                   eval      Eof = Read(peFD: %addr(Char): 1)

Here's the big one.   This is where your performance is falling flat.
Never read one byte at a time where performance is an issue.

You might try using the ILE C routines for "buffered stream IO".  Just
replace the open(), read() and close() APIs with _C_IFS_fopen(),
_C_IFS_fgets(), _C_IFS_fclose(), etc.   The really nice thing about
_C_IFS_fgets() is that it reads up until the end-of-line delimiter for
you.   You don't have to search for the x'0A'.

The downside to the _C_IFS_fopen() family of functions is that they're not
as versatile (IMHO) as the open() family.  If you do use continue to use
open(), read(), close(), you can improve performance dramatically by
simply reading the data in chunks into a buffer, and then reading it from
the buffer byte-by-byte instead of reading the file byte-by-byte.

Keep in mind that every time you read a byte, it has to transfer at least
one disk sector into memory.    If a sector is 512 bytes, then you're
reading one sector 200 times to get a 200 byte string.   But, if you read
it 1024 bytes at a time, then it loads only two sectors to get 5 full
strings.   Think about that...   it takes 1/100th the time to get 5 times
as much data!!

Okay, that's not entirely true.   Operating systems do various things to
optimize file reads, including loading things into memory ahead of time,
etc, etc.   However, that example does illustrate (even if it's an extreme
case) why reading larger buffers performs better.

The fopen/fread/fgets/fclose style APIs are optimized by loading things
into static buffers behind the scenes, and searching those buffers to
get the end-of-line characters, etc.

You can accomplish the same type of thing with open/read/close style APIs
simply by loading things in larger chunks.   Like, if you load 1024 bytes
at a time from the file, then just loop through those 1024 bytes to find
the x'0A', things will perform better.

I've got an example of a (very simple) buffered read routine on my
web site here:
http://www.scottklement.com/rpg/ifs_ebook/textreading.html

of course, you'd have to convert it to use a varying string... but I think
you can probably handle that.

>
> C                   if        Length > 0
> C                   call      'QDCXLATE'
> C                   parm                    Length
> C                   parm                    String
> C                   parm      'QEBCDIC'     Table
> C                   end

If you open the file with O_TEXTDATA defined, you shouldn't need to
manually translate it to EBCDIC.    That should perform a bit better.
If nothing else, using iconv() instead of QDCXLATE will perform better
becase you'd be calling a service program instead of loading/running a
*PGM object.

>  * Return that line
>
> C                   eval      StringOut = %trimr(String)
> C                   return    StringOut
>

I don't understand why you're trimming StringOut.  Wasn't that a varying
field?!  why would it have trailing blanks?   And if it does have trailing
blanks, you probably wanted to keep them!


As for your field parsing routines, I'm going to have to pick those apart
tomorrow or something, since it's now 2:00am, and I'm suddenly REALLY
tired...

But, I think I've probably given you some food for thought.