Re: Leading Zeroes in PF -- MIDRANGE-L

>
> It is more than 2 bytes per record length, we are talking 5 bytes out of a
> total of 14, again may not sound like much, but it will add up when you
> start adding 700-800 million records


OK, I guess this goes back to the concept of zip code to zip code road
mileage distances so I can see why you are content to deal with 5-digit zip
codes and "know" you won't deal with Candian or other alphameric postal
codes.  This case is very much the exception, but I can see where you are
coming from.

If space is that much of a concern, you could fit it all in 7 bytes for 50%
savings in the data portion of the physical file, which of course excludes
the index needed.

But the third field could very easily be made a 2-byte (5 digit) integer
saving one byte over the packed counterpart.  And for the zip codes, you
could use the last 5 bytes of an 8-byte (20 digit) integer, using a DS.
Using 5 bytes gives you 40 bits or 20 bits per zip code, and you really only
need 17 per zip code.  Then to create the "key" with the composite zip
codes, you could take the first zip code, bit shift left by 20 bits (or at
least 17) then add the second zip code into a 20U 0 unsigned integer field
which is in bytes 1-8 of a DS.  Define a 5-byte alpha field in bytes 4-8,
and that would contain both zip codes compressed down to 5 bytes.

Then save the index overhead by building the file in zip code sequence, and
binary search the file instead of using an index (it would need a maximum of
30 attempts to find any entry from among 1 billion possibilities). And now
that you are doing a binary search instead of using an index, you can
actually compress the data into 6 bytes each:  17 bits for zipcode1, 17 bits
for zipcode2, and 14 bits for the mileage.  That makes 48 bits or exactly 6
bytes.  This would be a piece of cake in C, where you can subdivide a UInt64
value by bits in a structure, but it isn't rocket science in RPG either.

That's the kind of thing I'd do if dealing with a very limited storage
capacity device, such as when I program for handhelds such as Palm OS where
space is at a premium and you don't have indexed files anyway.  Using this
technique, all 800 million records would fit in under 5GB compared to your
current 7.5GB fot the data plus ??GB for the index area.  (What does DSPFD
show as the total size of your current file including the index?)   I'm
guessing this would be under half the DASD of your current approach, which
as you say is already 35% smaller data area than using alpha fields for the
zip codes and mileage.

Is saving another 5+GB worth it when dealing with a machine like the i5?  In
general I'd say no.  But since you seem concerned about the DASD
consumption, this is one way you can probably cut it to under half your
current size.  And the coding isn't very complex -- but encapsulate it
inside a subprocedure which accepts two zip codes and returns the mileage
(and put it in a service program if you need this is over one spot).  Then
you can change the subprocedure and data model without impacting the rest of
the application architecture.

Doug