× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Kurt,

Hmm, can you get exclusive update access to the previous and
input files?

It sounds to me like a case for matching records <grin />.

In particular, how long does it take to do
CPYF PREVIOUS QTEMP/DISCARDME CRTFILE(*YES)
with full-sized PREVIOUS? That is a lot more work than reading
the whole file in key sequence, and for mere tens of millions of
records, I suspect it would be a lot faster than the nineteen
hours you mention in your first posting. Of course, you would
have to arrange to process the new records in the same sequence.

Of course, this gives a run time at least proportional to the
size of PREVIOUS, and at some size that will exceed the time for
a direct-access method. Do you know how big PREVIOUS might become?

This simple-minded scheme does not give a direct way to exploit
more processors or more memory, something which comes for free
if you use SQL to select the incoming transactions. Darn, I have
been trying so hard to find an excuse to use matching records.

HTH,
Terry.


On Tue, 2010-06-15 at 13:51 -0500, Kurt Anderson wrote:
A couple answers at once:

The previous file doesn't always start at 0 records. It contains
months worth of data. A previously processed record may be from the
same input file, or may not.

The previous file only contains the key fields to determine a
duplicate.

True, the majority of the time a duplicate is an exception to the
rule. However we do have clients that send us 40% duplicates (yes, I
know, sigh). If I were to do the Write and catch the Duplicate error
I'd have to also remove the job log entry (maybe not "have" to, but
100,000 job log messages about a duplicate write would be annoying).
This is where I wish Alison Butterill would recognize the need to
suppress a job log message when monitoring for an error. Sure, I can
do it with an API, I know (and have).

The Previous file, in this case, is gb's in size. We don't currently
have enough Main storage to accommodate, much less to allow multiple
jobs (using different files) to process at once. We are looking at
getting more memory.

Select Distinct may be a good idea for eliminating duplicates within a
single input file. I'll definitely consider that.

We are blocking the read on the input file. However, the size of the
Previous file is really the issue here. Empty the Previous file, and
the program flies. Fill up the Previous file, and the program grinds.

The Input & Previous file will never have deleted records in them.

Thanks for all of the responses,
Kurt



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.