× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



JSON is both hierarchical and stream oriented...

My first thought would be that it's not possible for JSON (CSV for instance
would be more doable)...but looking at the answers here provides at least
for the possibility of some benefit I suppose...
https://stackoverflow.com/questions/30982175/load-a-large-json-file-using-multi-threading


The problem is that unless the first chunk is a fixed size, you're going to
have to read through the first half to figure out where the second half
starts. Disk I/O is disk I/O regardless...

You might be able to get some performance gains by creating a process that
- starts reading in the stream file data, storing it in a teraspace memory
buffer (4GB max)
- once it finds the start of the second chunk
- kicks off a thread to parse the first chunk
- continues reading the stream file, storing data in a new teraspace buffer
- once done reading, kicks off parsing of the second chunk.

You don't say how big each message is, but if each chunk is less than 16MB,
you could use a user space and spin off seperate jobs easily enough (rather
than threads).

However, I have to wonder if it's worth the effort.

It seems that YAJL offers an "event driven" option (see yajl_stmf_parse())

If most those 3000 elements aren't of interest to you, you might find using
that parsing method much better performant than YAJL-INTO.

If you're interested in most of the data, but it's very repetitive, rather
than using YAJL-INTO to process the entire doc into a single structure,
take a look at the 'path' and/or 'handler' options of DATA-INTO. I've
never compared the options, but it would seem to me that simplifying the
data to be returned would help increase performance.

Finally, don't overlook disk I/O...
If you're using an SQL Insert to add records, do a multi-row insert rather
than 1 at a time. (And if not using SQL, consider it :)

Additionally, if the file is journaled, do the writes under commitment
control. This will allow the journal to cache the journal entries in
memory until you commit, rather forcing them to disk for every record.
(Huge, and I mean huge improvement) (There's an optional license program
you can pay for to force this caching to occur 5770SS1 42 HA
Journal Performance )

Charles






On Thu, Aug 20, 2020 at 10:51 AM Stephen Piland <
Stephen@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Does anyone have a real world example of this? A link or anything?

I'm trying to decode a very large number of element JSON using DATA-INTO
and it using YAJLINTO as the %parser. Each file on the IFS is one
transaction/message and can have over 3k individual elements. Each message
has 2 big blocks of elements split about 50/50. Each half of the message
end up writing out to separate group of PFs. Can I process each half
simultaneously and not have to submit a ton of separate jobs? Just trying
to speed the process.

Any thoughts would be appreciated! Thanks!
--
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.