JSON is both hierarchical and stream oriented...
My first thought would be that it's not possible for JSON (CSV for instance
would be more doable)...but looking at the answers here provides at least
for the possibility of some benefit I suppose...
https://stackoverflow.com/questions/30982175/load-a-large-json-file-using-multi-threading
The problem is that unless the first chunk is a fixed size, you're going to
have to read through the first half to figure out where the second half
starts. Disk I/O is disk I/O regardless...
You might be able to get some performance gains by creating a process that
- starts reading in the stream file data, storing it in a teraspace memory
buffer (4GB max)
- once it finds the start of the second chunk
- kicks off a thread to parse the first chunk
- continues reading the stream file, storing data in a new teraspace buffer
- once done reading, kicks off parsing of the second chunk.
You don't say how big each message is, but if each chunk is less than 16MB,
you could use a user space and spin off seperate jobs easily enough (rather
than threads).
However, I have to wonder if it's worth the effort.
It seems that YAJL offers an "event driven" option (see yajl_stmf_parse())
If most those 3000 elements aren't of interest to you, you might find using
that parsing method much better performant than YAJL-INTO.
If you're interested in most of the data, but it's very repetitive, rather
than using YAJL-INTO to process the entire doc into a single structure,
take a look at the 'path' and/or 'handler' options of DATA-INTO. I've
never compared the options, but it would seem to me that simplifying the
data to be returned would help increase performance.
Finally, don't overlook disk I/O...
If you're using an SQL Insert to add records, do a multi-row insert rather
than 1 at a time. (And if not using SQL, consider it :)
Additionally, if the file is journaled, do the writes under commitment
control. This will allow the journal to cache the journal entries in
memory until you commit, rather forcing them to disk for every record.
(Huge, and I mean huge improvement) (There's an optional license program
you can pay for to force this caching to occur 5770SS1 42 HA
Journal Performance )
Charles
On Thu, Aug 20, 2020 at 10:51 AM Stephen Piland <
Stephen@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Does anyone have a real world example of this? A link or anything?
I'm trying to decode a very large number of element JSON using DATA-INTO
and it using YAJLINTO as the %parser. Each file on the IFS is one
transaction/message and can have over 3k individual elements. Each message
has 2 big blocks of elements split about 50/50. Each half of the message
end up writing out to separate group of PFs. Can I process each half
simultaneously and not have to submit a ton of separate jobs? Just trying
to speed the process.
Any thoughts would be appreciated! Thanks!
--
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.
Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com
As an Amazon Associate we earn from qualifying purchases.