Update on "big file upgrade" approach

Okay folks,

I've had a bit of a breakthrough, so I thought I'd share it with you all.  The 
task is to migrate data from an earlier version of a file to a newer version of 
the same file.  A "database upgrade".

Here is additional detail as context.
Our machine is a 720 2-way at V5R1.

record count = 3,550,309
record length = "from" definition, 1650 bytes, "to" definition = 1862 bytes - 
relatively large record size
Unique key on PF = 1st 4 fields, 53 total bytes, all alpha
No logical files attached, so the only access path is the UNIQUE one on the PF.

Here are my run times (clock) for other methods:

CPYF FMTOPT(*MAP *DROP) - 2 hours, 57 minutes
CHGPF - 2 hours, 33 minutes
CPYF FMTOPT(*MAP *DROP) with record ranges running in parallel - 1 hour, 21 
minutes
new approach described below - 23.5 minutes


Here's the "CPYF parallel" approach.  I retrieve the number of records in the 
file, and divide that by the number of parallel jobs I want to submit, giving 
me the "increment".  I then submit CPYF FMTOPT(*MAP *DROP) with FROMRCD(1) 
TORCD(increment), increasing by the increment till end of file.  (I actually 
work backwards from *END to 1, but the idea is the same.)  Since our files are 
REUSEDLT(*YES) I don't worry about large RRN gaps due to deleted records.  
Apparently the (*MAP *DROP) processing is rough on the CPU, because there was 
no difference in clock time between 4-jobs-wide and 50-jobs-wide, but CPU 
utilization was nearly 100%.  Several different widths from 4 to 50 all ran 
within 2 minutes of each other (clock time).

Here's what seems to be the fastest approach.  Below is a CL pgm that performs 
the work.  JOINFILE is an empty file in the same format as the target file 
(updated definition).  This supplies the new field definitions to the output 
format.

pgm

             OPNQRYF    FILE((TESTLIB1/ORIGFILE) (GAPFILE)) +
                          FORMAT(QTEMP/JOINFILE) JFLD((1/KEY1 +
                          2/KEY1) (1/KEY2 2/KEY2) (1/KEY3 +
                          2/KEY3) (1/KEY4 2/KEY4)) +
                          JDFTVAL(*YES) OPNID(JOINOPEN) +
                          SEQONLY(*YES 108)

             OVRDBF     FILE(NEWFILE) SEQONLY(*YES 108)
             CPYFRMQRYF FROMOPNID(JOINOPEN) +
                          TOFILE(TESTLIB2/NEWFILE) MBROPT(*REPLACE)
return
endpgm

I created a "gap" file which is an _empty_ PF containing the key fields and the 
"new" fields only.  (In other words, just the fields added to the ORIGFILE 
format.) Joined on the keys, selecting JDFTVAL(*YES) to get every record in the 
original file.  Then simply CPYFRMQRYF from the join to an actual PF in the new 
format.

While I can't really explain why this is so much faster, or why the parallel 
stuff didn't give the return I wanted, I'm thrilled with the result.

Regards,
Michael Polutta
Atlanta, GA

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.

Update on "big file upgrade" approach - long