× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Thanks everyone for the input. Here are my constraints (which I should have
put down before).

The data (about 75 GB) is on a production box, on which the only thing I
can do is to pull it to another development box (no access to command line
or SQL or IFS on the production box).

On the development box, I have access to SQL, but not too sure about IFS.
The data is in EBCDIC, so unsure what it will translate to in ASCII (by the
way, the file also has another control character x'1E' in it. Did not
mention it before, as I though it was not relevant to my problem).

Once I have the sorted file, I have to chop it up into 500,000 records
each, so that those files can be further processed in parallel. These files
will have to be copied back to production machine by another team (I can
pull the data off, not put it back on production).

I tried Chuck's solution, on a test file and it works. I will have to check
into the IFS access.

The machine is already at over 90% disk usage and sudden jump of 2-3
percent, may be frowned upon (but there is no way around that).

I am sure the sorting of 18 million records will take some time, and can
you say which one would take less system resources? (There are scores and
scores of other developers on the system who would scream, if I hog the
system for a few hours).

As John mentions, I could write a simple program to create an intermediate
file with a key, reorganize it to get it sorted, but I think the RGZPFM
will use as much system resources as the sql would (not sure about the Unix
sort).

Vinay

On Sun, Jan 22, 2012 at 10:57 AM, John Yeung <gallium.arsenide@xxxxxxxxx>wrote:

On Sun, Jan 22, 2012 at 7:22 AM, CRPence <CRPbottle@xxxxxxxxx> wrote:
On 21-Jan-2012 21:08 , John Yeung wrote:
It's interesting to see the solutions people come up with, based to a
very large extent on what they are comfortable with. "When all you
have is a hammer, all your problems start looking like nails" and all
that.

Or, the problem might just be a nail.

Absolutely. Plus, I think it's an oversimplification to imply that
modern tools are as simple and narrow-purpose as a hammer. It
actually grates on me whenever I read someone say, in a computer
context, "use the right tool for the job". I think most computer
problems have many "right" tools. It would be more accurate to say
"don't use the wrong tool for the job", but that is longer and doesn't
have the same ring to it, I guess.

(And, as you imply when you say "if ..., then the best tool for the
job would presumably be...", it is more often the case that there are
"better" and "worse" tools, rather than "right" and "wrong" ones.)

Besides that, the OP stated that they wanted to avoid their current
proposed solution that would "create an intermediate file, with the Last
Name as a separate field". And their own solution description which
they apparently have chosen to avoid, seems little different than the
"Unixy" solution that was proposed.?

I agree. My own feeling is that, if the OP has enough disk space (he
seems to mention the size of the file in order to impress upon us that
either memory or disk could be a constraint), then the most
straightforward solution, regardless of specific language/environment,
is to make an explicit intermediate file which is more easily sorted.
I.e. in broad terms, to use OP's own solution.

You brought up another good point, which is: It could make a
difference whether the file is a physical file or a stream file. This
is one reason my favorite "hammer" on the i is iSeries Python. It
handles both with incredible ease. (I didn't already suggest this
route for the OP mostly because he seemed to be looking for solutions
that don't involve writing his own program.)

John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.