Re: %lookup on a Data Structure Array -- RPG400-L

On 10-Nov-2011 11:26 , Kurt Anderson wrote:

I'm aware of SETOBJACC, however I don't want this entire file in
memory, only two fields. Also, each of our clients has their own
version of the file, so keeping it in memory doesn't seem like a
viable option regardless.

I do need to retrieve a value, so I can't simply check for the
existence of the record.

FWiW: SETOBJACC MBRDATA(*ACCPTH) against an SQL INDEX previously created to include just those two fields "brings" the both the key and the data into memory; the first column in the INDEX being the key and the second column having the value to be retrieved. Having such access path available enables the possibility of index-only query access to get the key and the [non-key] value with just one I\O for each read.

I am aware of a potential performance hit of loading the array all
at once. These are all batch jobs, and at most there would be 33k
records (I'd define the array to be 50k to allow for growth), which
is going to load in seconds - so from a batch perspective, extra
seconds once is ok.

Using the INDEX which has the Record Format with just the two fields to obtain the data for pre-cache in an array for the %lookup activity via either SQL or a query ODP [OPNQRYF] enables the index-only possibility for data retrieval, which can make that batch application start-up "hit" significantly smaller.

The INDEX would preferably be existing and maintained versus created at run-time, obviously, although delayed maintenance may be good for the given scenario. And when SMP is available, then the index can be built for open, or created new, being built quickly and aggressively; I do not recall if also available for delayed. Even without SMP, the database index build can perform much quicker on the physical data than loading the subset of those two fields than either reading a keyed two-field LF [w/out index-only] or sorting the data of a two-field LF; if even either of those would be considered.

However this discussion has given me the idea (or maybe someone
actually mentioned this and I took it the wrong way, yet lead me to
the same conclusion) that I could check the array. If the customer
isn't there, then go to the file and get the value I need plus add
the customer/value to the array. So in the case of having 33k
customers, maybe my job of running 30 million records only uses 15k
of those customers, then I've made the array smaller so the lookups
would be quicker.

Has consideration been given to the possibility of an entirely different means to effect whatever is being done, rather than only investigating quicker to effect lookups or minimizing the number of lookups? Might the entire activity be offloaded to an effective join via the SQL; i.e. eliminating the update via RLA completely?

I wonder, but have not investigated, if the ability exists from the use of a DETERMINISTIC scalar function to get the cached values, which would reduce the I\O; i.e. the lookups deferred to a UDF [especially if implemented with index-only].?

Regards, Chuck