Re: Suggestion for improving loop performance -- RPG400-L

On 19-May-2015 19:08 -0500, Singh, Ashwani (Lake Mary) wrote:

On 19-May-2015 16:51 -0500, CRPence wrote:

We have to build one CIF record from these incoming records.. this
is also an example only and actual elements can go up-to 1600.

Is there an opportunity to ask for revisions to the way they export the data? Could they not perform a more /standard/ pipe-delimited text export of their data instead of the alternative they offer [whereby one record is represented as many each prefixed by control data]? Conceivably, with some specific alterations, the opportunity could exist to send the data directly into a database file [rather than a stream file] and thus make the scenario more directly conducive to manipulation directly via SQL SELECT, and thus possibly to ease the coding requirements of a program to parse; even possibly to present the data directly to the database import feature by encapsulating the SELECT in a VIEW.

Options that I was thinking was to -: Directly read data from IFS
and reformat in RPGLE.

English does such a poor job of explaining thoughts about coding something. Such thoughts are best expressed in pseudo-code, or at least much more explicitly as step-by-step, because the above text leaves out too much detail. At first I thought the meaning was, in RPGLE to effect:

create table target_PF ...
do forever
read a STMF sub-record
if primary record or eof then do
place prior row-data into insert-row staging-area
if staging-area is full, insert-multiple into target_PF
enddo
if eof then leave
parse new sub-record
append new parsed data onto current row
if missing pipes, append new NULL or DEFAULT to current row
enddo
if staging-area not empty, insert-multiple into target_PF

But reading later options, in both is mentioned the DDS file; I would have thought the Option1 was doing output to the DDS-created file, so why not mentioned here? Or is Option1 just reformatting the stream data to both add missing pipes and remake the multiple record data for each entity as one contiguous pipe-delimited record; with the intention then to use Copy From Import File (CPYFRMIMPF) to do all the parsing and populating and parallelism?

Option2: Copy from IFS --> Read all 39 million records that I have
to read to come to 720K records that what we are expecting but that
seems too inefficient with the current approach --> Generate the
required DDS file data.

I am sure it was 69mil before! :-(

By /Copy/ does that mean effective [or actually] Copy From Stream File (CPYFRMSTMF) to make a 39m row database flat file that is then regenerated into a ~720k rows of describe data... by unstated means? Perhaps not, but hopefully my point can be inferred, that from the above description, the reader has little idea how or even if the progress was made from origin as STMF into the target DDS-created PF.

And BTW, why DDS? Why not SQL TABLE?

Option3: Copy from IFS ---> Output directly by reading the file into
DDS file - COPY from IFS takes a long time.

Again, two references to the word _copy_ which often allude to the use of the various /Copy/ utilities. A program usually /reads/ the record data and /writes/ [or inserts] the row-data. I am thinking maybe this option is what I thought Option1 was supposed to be.?

Again, English is quite vague for describing algorithmic processing; English formed as pseudo-code can be much clearer.

But I guess all is moot [including my later inline comments], because given now we know for sure that the file record data originates as STMF, the fastest approach surely is to open and read the stream records, generate the row-data, and write\insert the row-data using either blocked writes or multi-row insert into the described database file [be that defined by DDL or DDS]. One note: BLOCK(*YES) mentioned is desirable, but even more desirable is ensuring the block-size is maximized; see the Override To Database File (OVRDBF) for the Sequential Only (SEQONLY) and the number of records [there was some enhancement in one of the newer releases to avoid the user-calculation for number of records].

* Are the components consistently numbered and always one through
sixteen?

No they can vary.

That makes them less conducive to use by the database. As does the varying length of the non-data prefix.

[...]
^CIF(1000001,2)="1XX-50-0XXX"
^CIF(1000001,3)="XXXXXXX|||XXXX"
[...]

* So component-2 has one element having the value of '1XX-50-0XXX'
as an apostrophe-delimited string, and similarly the component-3
has four elements with elem-1 having the value 'XXXXXXX', elem-2
having the value '', elem-3 having the value '', and elem-4 having
the value 'XXXX'?

No that's a single record for us.

I realized those two /sub-records/, the records I referred to as defining component-2 and component-3 of the apparent entity called 1000001, were all sub-components of what should be just one record [but for whatever reason, instead were exported as separate records; as the so-called sub-records]. I was just interrogating\verifying my understanding, from what was visibly presented in the sample data.

Thus if component-2 had just that one visible element, but that component is expected to have two elements [thus missing one element], and if component-3 had those four elements and expected to have only those four elements, then the record would look like ...|1XX-50-0XXX||XXXXXXX|||XXXX|... where the ellipses represent the preceding or following data in the fully-formed pipe-delimited record.

* Each component-# has a known\expected number of elements; in the
OP, one [the primary] was expected to have five elements but the
original example showed three-of-five whereas the posted\attached
data file shows only one element for component-1. So is the
problem general, in that any component-# might have included fewer
than the expected number of elements?

Correct, that was just an example

And thus in my interpretation above, whereby the second record with the ^CIF(1000001,2) data provides one element yet proposing a scenario whereby two are expected, then adding the extra pipes and appending to the parsed results of ^CIF(1000001,1) is what should happen with the progression towards a reformatted\fully-formed record.?