|
> > It is more than 2 bytes per record length, we are talking 5 bytes out of a > total of 14, again may not sound like much, but it will add up when you > start adding 700-800 million records OK, I guess this goes back to the concept of zip code to zip code road mileage distances so I can see why you are content to deal with 5-digit zip codes and "know" you won't deal with Candian or other alphameric postal codes. This case is very much the exception, but I can see where you are coming from. If space is that much of a concern, you could fit it all in 7 bytes for 50% savings in the data portion of the physical file, which of course excludes the index needed. But the third field could very easily be made a 2-byte (5 digit) integer saving one byte over the packed counterpart. And for the zip codes, you could use the last 5 bytes of an 8-byte (20 digit) integer, using a DS. Using 5 bytes gives you 40 bits or 20 bits per zip code, and you really only need 17 per zip code. Then to create the "key" with the composite zip codes, you could take the first zip code, bit shift left by 20 bits (or at least 17) then add the second zip code into a 20U 0 unsigned integer field which is in bytes 1-8 of a DS. Define a 5-byte alpha field in bytes 4-8, and that would contain both zip codes compressed down to 5 bytes. Then save the index overhead by building the file in zip code sequence, and binary search the file instead of using an index (it would need a maximum of 30 attempts to find any entry from among 1 billion possibilities). And now that you are doing a binary search instead of using an index, you can actually compress the data into 6 bytes each: 17 bits for zipcode1, 17 bits for zipcode2, and 14 bits for the mileage. That makes 48 bits or exactly 6 bytes. This would be a piece of cake in C, where you can subdivide a UInt64 value by bits in a structure, but it isn't rocket science in RPG either. That's the kind of thing I'd do if dealing with a very limited storage capacity device, such as when I program for handhelds such as Palm OS where space is at a premium and you don't have indexed files anyway. Using this technique, all 800 million records would fit in under 5GB compared to your current 7.5GB fot the data plus ??GB for the index area. (What does DSPFD show as the total size of your current file including the index?) I'm guessing this would be under half the DASD of your current approach, which as you say is already 35% smaller data area than using alpha fields for the zip codes and mileage. Is saving another 5+GB worth it when dealing with a machine like the i5? In general I'd say no. But since you seem concerned about the DASD consumption, this is one way you can probably cut it to under half your current size. And the coding isn't very complex -- but encapsulate it inside a subprocedure which accepts two zip codes and returns the mileage (and put it in a service program if you need this is over one spot). Then you can change the subprocedure and data model without impacting the rest of the application architecture. Doug
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.