Re: performance in parsing xml with Expat -- RPG400-L

One quick & dirty test...

What does your CPU usage look like vs. disk usage?

If your CPU isn't up near 100%, the code is probably waiting on the
disk, so you're going to want to look at optimizing I/O.

Besides the buffering on the IFS file, look at the tables you are writing too...

--Are they journaled? If so, and they really need to be, make sure
your using commitment control with "long" (128KB) commit cycles.
--Make sure FRCRATIO(*NONE) is set on the tables
--Consider specifying SIZE( ) and ALLOCATE(*YES)...
--What indexes are on the tables? If not needed during the build, set
MAINT(*REBLD), or remove the LF member till after the build is done...

Some good resources...
http://www.redbooks.ibm.com/redbooks/pdfs/sg246286.pdf
ftp://service.boulder.ibm.com/as400/web/benchmark/batprf.pdf

HTH,
Charles

On Mon, Aug 27, 2012 at 11:51 PM, jfranz46 <jfranz46@xxxxxxx> wrote:

Thanks for all the input.
To answer a few - yes it's a one time run (12 xml files, each a years worth
of orders) and each order
is approx 10 tables, and many hundreds of records per order. Each file could
be 500 meg to a gig.
In a prev post weeks ago I explained the basic layout, & Barbara Morris
suggested the native RPG opcodes not a good fit. The code around the Expat
all working well, and there is some necessary logic
to get the output of the 10 tables "in sync". but not too much.
I can test the parse versus the other logic for performance, but think the
buffer and the optimize may go long way. v6r1 btw.
Jim Franz

----- Original Message -----
From: "Scott Klement" <rpg400-l@xxxxxxxxxxxxxxxx>
To: "RPG programming on the IBM i / System i" <rpg400-l@xxxxxxxxxxxx>
Sent: Monday, August 27, 2012 6:32 PM
Subject: Re: performance in parsing xml with Expat

Hi Jim,

On 8/27/2012 2:49 PM, James Franz wrote:

Looking for suggestions to speed up parsing xml files with millions of
records.
Using Scott's Expat RPGLE and works great, but taking couple hours in
batch per file.

First of all, I don't deserve any credit for Expat. I had no part in
writing it at all. All I wrote was a few simple prototypes to call it
from RPG.

It's really hard to address a performance problem without first
determining where your program is spending it's time. Logic dictates
that the time would be spent interpreting/parsing the XML, but it could
also be spending time reading from disk, or it might be memory strapped,
causing it to slow down significantly due to that. Or, of course, it
could be in your back-end code (the code that runs in your handler
routines)

Just some general suggestions:

1) Try recompiling Expat with OPTIMIZE(40) and DBGVIEW(*NONE) this can
make a significant difference in C code, and you're unlikely to need to
debug Expat, anyway.

2) Make sure the code that reads the XML data (you didn't say where
you're getting it from... should I assume a stream file?) is reading
optimially. If it's a stream file, make sure the buffer size is a
multiple of the disk block size (available from the statvfs() API). I
would suggest about 20-30 times the disk block size would be a good
starting value.

3) Try using the PEX APIs to insert milestone checkpoints that you can
use to pare down where the performance issues are occurring.

-SK
--
This is the RPG programming on the IBM i / System i (RPG400-L) mailing
list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.

--
This is the RPG programming on the IBM i / System i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.