× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Hi Sas,

>suppose we have one million record should be read and 
>it will take ex. 5 minute , so if we want the job to 
>be done in 1 (one) minute, so we should run pgm1
>( to read record 1-200),pgm2 (to read record 201-400)  
>and so on until pgm5( to read 801-1000) and all the 
>programs should be run on the same time .

I have experience with this, and I understand that your 5 minutes is only an
example.  You need to be very careful how you partition your data.  You have
to be very certain that the final output does not depend on results from any
other records.  For instance, if you are calculating sales by week, and
records 1-300 fall into week one but you partition the data so that pgm1
reads only records 1-200, pgm1 will not see all the records for week one.
If pgm1 prints a report indicating say, low sales then it is not correct
because pgm1 never saw all the sales for week one.

The actual process of running 5 separate jobs is simple, but again you need
to be concerned about the output.  If you expect a report of total sales,
you will have 5 subtotals that you need to add together.  Normally this
means that you will create a summary work file that another program will
print/update from.

The actual runtime performance will probably not be 5 times better unless
you have a multiprocessor machine and run the jobs in a pool large enough
for all 5 jobs to run without thrashing.  This depends heavily on the nature
of the tasks you intend to parallel.  You may be better off running the
separate jobs in separate pools or in one large one.  The overhead of
calculating the partitioning, dispatching, starting and running several jobs
plus the cost of the final summariser at the end can't really be calculated
in advance.

The general technique I have used is to run a crude data analyser over the
input file.  For instance, if I am interested in sales by week, I might SQL
the file GROUP BY week to get a list of weeks in the file.  That list is
then fed into a scheduler, which submits the parallel jobs with the
appropriate parameter.  Each of those jobs does its required processing and
also writes to the summary file.  Finally, after each of the parallel jobs
is complete, the summary job runs and accumulates the individual summary
records.  There is no automatic iSeries mechanism that I know about to do
any of these tasks.  You will need to write all the code yourself.

sbmjob monitor jobq(a)
select week,count(*) from input group by week order by 1
begin loop
  fetch week...
  sbmjob parallel parm(week) jobq(b)
  repeat loop until end of records

job parallel
  call clpgm week
  ovrdbf...
  call rpgpgm week
    either setll/reade on key week or do other 
      processing that will limit this execution
      to the record range of interest
    update summary record
  call tellMonitorThisWeekIsDone week

job monitor
begin loop
  rcvmsg week
  update dbMonitor that week is done
  check all weeks for completion
  Complete?
    sbmjob summary jobq(a)
    terminate
  Not complete?
    repeat loop

This is all from memory and you will definitely need to revise it for your
situation.  I use data queues to send messages between processes.

Be cautious when thinking about this.  There isn't an easy way that I know
of to model the performance behaviour of such a system like this in advance.
That means you will probably be forced to actually design and run the
parallel model in order to be able to compare it to the single job process
you already have in place.

Perhaps there is a better way to approach your performance issues.  Run
Performance tools or PEX (depending on the version of OS/400 you're on) and
collect actual statistics on the current process.  By using the collected
data you will probably find some place that is taking up much of your time,
whether it is disk access or even a calculation loop.  I strongly suggest
you do this before thinking about a parallel run.
  --buck

As an Amazon Associate we earn from qualifying purchases.

This thread ...


Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.