On Sat, Feb 23, 2019 at 10:36 AM Jon Paris <jon.paris@xxxxxxxxxxxxxx> wrote:
Given the earlier description of the file in question John I don't see memory as an issue
Because it's small enough?
besides if it were using XML-INTO with a handler would give "row by row" access and a tiny memory footprint.
I'm not familiar with XML-INTO, and I certainly don't know about
handlers. Row-by-row access doesn't make sense to my understanding of
what DOM is. I'm not doubting you at all; I imagine you've done this
yourself already many times.
Are you sure you mean SAX by the way? Surely processing the DOM is the only way to really be highly selective. XML-INTO can be every bit as selective as XML-SAX since it uses the same path mechanism.
I'm absolutely positive I mean SAX. The way I understand DOM is that a
tree of the whole document is built up front. It doesn't matter if it
happens to read one character at a time of the incoming file;
ultimately the whole document *must* be parsed, and the whole tree is
in memory at once. In contrast, SAX is piecemeal. Only one "event" at
a time is conceptually *required* to be in memory at any given time.
Once the parser feeds you the one event, it's free to throw it away
and look for the next one. Of course, if you (as the caller) are
collecting everything and building your own tree of the whole document
yourself, then sure, there's no savings in either memory or speed over
DOM. But if *you* are also throwing away what you're not interested
in, you could save memory with SAX. And if you find what you need
before you finish parsing the whole document, then you can also save
time by not bothering to parse the rest of the document at all.
Now, I get that OP sounds like he needs to process the whole document.
But there *could* be such a thing as a document so big that building
the whole tree at once is not feasible, yet processing one or a few
events at a time *is* feasible. In principle, there is no limit to the
size of the document processed by SAX. (Of course, there could be
pathological cases of XML where the entire document consists of one
gigantic payload tag, in which case either style of parsing would be
doomed.) So, my thought was ***IF*** (I cannot stress enough, it's a
big IF) the OP is close to the limit with DOM, he can try SAX. Maybe
OP is nowhere near the limit, and the problems are due to something
else entirely. But as I also said, one thing you can do with SAX is
see where you are in the document, *as* the document is being parsed.
In my understanding of DOM, you won't see anything at all until the
whole document is parsed.
Maybe XML-INTO isn't strictly DOM?