Re: recommendations for faster IFS and string handling sub-procs. -- RPG400-L

(continuing to pick apart code where I left off last night)


On Fri, 13 Jun 2003, Richard B Baird wrote:
>
> P ParseDelimAlph  B                   EXPORT
> D ParseDelimAlph  PI           256a   varying
> D   peString                 32765a   varying
> D   peDelim                      1a   const
> D   peIndx                       5i 0 const
> D   peError                       n
>
> D str             S            256a   varying
>
> C                   eval      str = ParseDelim(peString:
> C                                       peDelim:peIndx:peError)
> C                   eval      peError = *off
> C                   return    str
>
> P                 E

As mentioned previously, I wouldn't return the string, I'd instead make it
be passed by reference.

I don't see the point behind this subproc at all.  It appears that it's
only purpose is to shut off the error indicator?!

I mean, why even pass an error indicator if you're going to always set it
to *off?


> P ParseDelimNumb  B                   EXPORT
> D ParseDelimNumb  PI            30p 9
> D   peString                 32765a   varying
> D   peDelim                      1a   const
> D   peIndx                       5i 0 const
> D   peError                       n

What release are you at?  In V5R2, you can use %dec() to convert from
character to numeric.  This routine could simply be:
      peError = ParseDelim(peString:peDelim:peIndx: str)
      return %dec(str: 30: 9)

>
>  *+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Oh hey, I just noticed that you're using '*+++++' to indicate the
start of your subprocedures.


> P ParseDelim      B
> D ParseDelim      PI           256a   varying
> D   peString                 32765a   varying
> D   peDelim                      1a   const
> D   peIndx                       5i 0 const
> D   peError                       n

As before, I probably wouldn't pass the string as a return value.  At
least, not if performance is important.  Though, I'd be MUCH more likely
to do that with 256 bytes than with 32k!!

You might also consider making peString (the record) a CONST parameter
if you can.   I like to make things that I don't change in my
subprocedures CONST values simply because it makes the prototypes a
little more self-documenting.

As for the code..   there's a number of problems with it.

> C                   eval      peError = *off
>
> C                   if        peIndx < 1
> C                   eval      peError = *on
> C                   return    *blanks
> C                   end
>
> C                   eval      peError = *off
> C                   eval      $beg = 0
> C                   eval      $len = 0
> C                   eval      str = *blanks
> C                   eval      $y = 0

Why are you setting peError off twice?  It can't possibly be *ON the
second time.  If it were on, the subproc would've already returned.

>
>
> c                   if        peIndx = 1
> c                   eval      $beg = 1
> c                   end
>
> C     1             do        peIndx        $x
> C                   eval      $y   = %scan(peDelim:peString:$y+1)
>
> C                   if        $x   = peIndx - 1
> C                   eval      $beg = $y + 1
> C                   end
>
> C                   if        $x   = peIndx
>
> C                   if        $y   = 0
> C                   eval      $len = (%len(peString) + 1) - $beg
> C                   else
> C                   eval      $len = $y - $beg
> C                   end
>
> C                   end
>
> C                   enddo



    1) The way you're parsing things, you can't have a comma inside a
         field.

         Take this example, which is very common in CSV:

         "Scott Klement","123 Sesame St","Milwaukee, WI","53201"

         See the problem?  I intended there to be 4 fields in tis
         record, but because of the comma between Milwaukee & WI,
         it will think there's 5.

         Maybe you should use a finite state machine to parse this,
         instead of simple scanning for the next peDelim?

    2) For each call to ParseDelim, we start back at the beginning of
         the record, and search forward until we get to the appropriate
         delimiter.   So, if you have 200 fields in the record, you're
         going to waste a lot of time searching the record over and
         over again for the peIndx field.

         Instead, why not just pick-up where you left off on the last
         call?  You'd always have to read the fields in order across
         the record, but it would perform better.
>
> C                   if        $beg = 0
> C                   eval      peError = *on
> C                   end
>
> C                   if        $len = 0 or $beg = 0
> C                   return    *blanks
> C                   end
>
> C                   eval      str = %trim(%subst(peString:$beg:$le

    3) I wouldn't %trim() the field after substringing it out.  What
         if those blanks were significant to the application?  If
         the program that calls your service program wants them stripped,
         let it do the %trim.  That way, a program that doesn't want them
         stripped has a choice.
>
> c                   dow       %scan('"':str:1) > 0
> C                   eval      str = %trim(%replace(
> C                                          ' ':
> C                                          str:
> C                                          %scan('"':str:1)))
> c                   enddo

    4) I don't like the way you're removing quotes from strings.  If
         I have a record like this:

           "James "Crazy Jim" Fisher","123 Sesame St","New York, NY"

         Do you really want to replace all those quotes with blanks?
         Really, the only quotes you want to strip are the ones that are
         immediately before or after a delimiter.   And instead of
         replacing with ' ' and then %trim(), why not just replace with
         a nothing?

That's all I can think of right now.