× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Hi,

I decided to spend a bit of time going through the xml docs in the infocenter over the weekend and I've got to say that it looks very well put together. Here are my thoughts - you are free to contradict me on any point. :-)


XML-SAX

The use of a single handler procedure is extremely useful. I did wonder how that would work, but passing the event type as the second parameter pretty much solves that one. The nice things about having a single point of contact between the parser and the program are (1) simplicity, and (2) extensibility ? it will be much easier to add new events without changing the architecture.

The use of XML-SAX and a handler procedure would work fine with our current code base (which uses the xml toolkit). We can simply refractor our code and put the calls to our existing event-handling procedures within this new procedure ? probably in a simple select statement based on event type. There would be some adjustments for handling attributes, but nothing too nasty.

Regarding well-formedness checking ? I can?t see any difference in what we do now and what we?d need to do with this native parsing method. At the moment we always parse the xml fully without any processing occurring ? we only register the warning/error/fatal error event handling procedures on the first pass. This is to allow us to validate the document in its entirety. If this straight parse does not return any errors we register the processing event handlers (start of element, process characters, end of element, etc?) and parse again. It is during this second parse that we decompose the xml data into database variables. Of course, the handler procedures could be different in each pass. Funnily, this native method allows a nice way of using the same handler procedure for both passes. The first parameter passed to XML-SAX is custom-built and can be anything. As a result, you could have two almost-identical lines of code for both passes, using the same handler procedure in both cases. To inform the handler procedure that it is in validation-only mode you could pass a special value in the first parameter. Within this procedure you could then simply return if the event is not an error event. As no SAX parser would assume well-formedness ? only programmers - I would always advocate a two-pass approach to xml decomposition if you can not guarantee the quality of the document. XML-SAX provides all I need for production-level code.

I?d probably do something like:

// validate the xml?
xml-sax %handler(myHandler : ?*VALIDATE?) %xml(?/home/xped/file.xml? ?doc=file ccsid=job?);

// If no exceptions, decompose the xml?
if not XML_EXCEPTION;
xml-sax %handler(myHandler : ?*DECOMPOSE?) %xml(?/home/xped/file.xml? ?doc=file ccsid=job?);
endif;

This is very neat and tidy - I like it.


XML-INTO

Now I had to have a good think about this one. As I?m used to having event handler procedures to parse complex xml I wasn?t sure how much use this would be. I know XML-INTO can use an event handler procedure but I can?t imagine myself using it this way unless I wrote a service program for doing bulk inserts into database tables using arrays and sql. (very nice idea though)

For me the obvious use is?

Many of our processes run asynchronously using NEPs sitting on data queues. The idea is to keep our workflow modular and to allow us to isolate functions within the workflow. This gives us the ability to scale-up or scale-down according to the business needs. These data queue entries are 512 bytes long strings mapped over a data-structure. This data-structure is used extensively by our programs ? and is used as an entry parameter to many functions. The problem is that some functions need some data and other functions need different data, accommodating the needs of all of the functions was difficult using a fixed-format DS within a data queue entry. We constantly have issues about space within this DS. If we need a new variable we can add it to the DS and recompile the programs. This is undesirable but easily achieved. But if we exceed the 512-byte limit we will need to change all of the data queues to use the longer length. This is a real pain in the butt. Now, this is where XML-INTO will be a dream for us. We could simply create the structure in xml, and only put the populated elements into the xml doc. This would then be passed to the data queue. The receiving NEP could then use xml-into to get the data from the passed xml and place it into the data structure it uses. As we would never use more than 20% of the DS fields in any one application, we would have a lot of room for manoeuvre with this technique.

I?d probably do something like:

xml-into localDS %xml(dqstring :?doc=string ccsid=job allowmissing=yes path=datastructurename case=lower?);

My only question regarding xml-into is this: How deep can we fill a DS with xml data? If we have a DS which has a sub-field which is LIKE another DS, could I expect a 3-level xml doc segment to decompose into the whole structure - including the fields in the nested DS? Does the process handle this kind of structure?


Regarding entity references: I assume that, given the following xml:

<?xml version=?1.0? encoding=?ISO-8859-1? ?>
<root>
<var1>That&apos;s life!</var1>
<var2>Bodgitt &amp; Scarper electricians</var2>
</root>

I would expect the parser to call the handler in the following combination of calls:

*XML_START_DOCUMENT (no data pointer passed)
*XML_VERSION_INFO (data=?1.0?)
*XML_ENCODING_DECL (data=?ISO-8859-1?)
*XML_START_ELEMENT (data=?root?)
*XML_START_ELEMENT (data=?var1?)
*XML_CHARS (data=?That?)
*XML_PREDEF_REF (data=?&apos;?)
*XML_CHARS (data=?s life?)
*XML_END_ELEMENT (data=?var1?)
*XML_START_ELEMENT (data=?var2?)
*XML_CHARS (data=?Bodgitt ?)
*XML_PREDEF_REF (data=?&amp;?)
*XML_CHARS (data=? Scarper electricians?)
*XML_END_ELEMENT (data=?var2?)
*XML_END_ELEMENT (data=?root?)
*XML_END_DOCUMENT (no data pointer passed)

(Although I would imagine that it is possible that *XML_CHARS may be called several times for the same element if the data is very large.)

So it would appear that we will have to resolve escaped entity references within our code. This is not a big thing - and we have been provided the ability to handle custom-references using *XML_UNKNOWN_REF, which is cool.


Anyway, this looks like a pretty comprehensive implementation and I can certainly use it in my production code. I would still need the xml toolkit for parsing with xml schemas, but that?s not too bad ? I only use schemas in development and can treat the toolkit as part of my IDE and not part of the production code. Also, the way this code handles attributes is excellent. I never really liked the way standard SAX parsers present attribute data to the call-back procedures at the start of an element ? even in java it is messy. This way is MUCH nicer.

Excellent work Barbara ? I will certainly have a good play with this once we get the upgrade!

Cheers

Larry Ducie



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.