RE: Large XML file processing via XMLTABLE -- RPG400-L

Thanks for the response John.
This is supposed to be a recurring process on potentially daily and
potentially multiple files.
I will look into SAX processing.
But, more importantly, I am going back to the people who generate this file
and see why it is even an XML document.
There is no requirement for flexibility of data, all columns, are present in
all records. This could easily be a .csv.

Paul

-----Original Message-----
From: RPG400-L <rpg400-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of John Yeung
Sent: Friday, February 22, 2019 4:13 PM
To: RPG programming on the IBM i (AS/400 and iSeries)
<rpg400-l@xxxxxxxxxxxxxxxxxx>
Subject: Re: Large XML file processing via XMLTABLE

I think the midrange list (which this was originally posted to) would have
been a more appropriate venue for this, but since the RPG list is where
people seem to have responded, I guess it's stuck here.

My question is: Is this a one-time thing, or will it be recurring?

The reason I ask is that if it's a one-time thing, then in your shoes, I
would preprocess the huge XML files on my PC. In my experience, for many
tasks, my PC is quite a bit faster than our i. This is not a knock on the i
at all, because the i handles multiple users and heavy disk I/O loads quite
well, whereas my PC only has to handle one user.

So in this case, I would use tools on my PC (such as Python, but also
conceivably other software) to transform the XML into either smaller, more
manageable pieces, or a more manageable transport format, or (even more
likely) just insert the final, parsed data directly into the database.

Regardless of whether you're sticking to RPG or open to alternatives, one
thing to keep in mind is that SAX-style parsing is generally better suited
to very large files, because it only needs to work with a small chunk of the
file at a time; whereas DOM-style parsing (as exemplified by XML-INTO) needs
to work with the whole file basically as a unit. When DOM parsing works, it
is generally faster than SAX.
But maybe you've reached the threshold where DOM is impractical. At a
minimum, SAX will more easily allow you to monitor your progress. If you try
to load the whole thing at once with DOM, and you quit after 3 hours, you
have no idea whether just waiting 10 minutes more would have allowed it to
complete, or whether it still wouldn't have finished after another 3 hours.

John Y.
--
This is the RPG programming on the IBM i (AS/400 and iSeries) (RPG400-L)
mailing list To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx To
subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related questions.

Help support midrange.com by shopping at amazon.com with our affiliate link:
https://amazon.midrange.com