|
Good evaluation! Here are my comments. It would be interesting to run some parser performance numbers to see how much time is spent going through a document strictly for the point of well-formedness. Sure small 5Kb files will parse fast, but what happens when it gets to 1MB or larger (which is common when you start doing XML to replace existing EDI). Passing XML between RPG programs is interesting because it gives RPG programmers a way to natively keep their code loosely coupled. Doing this before the bif's I wouldn't have even considered it because of the programming overhead, but now it all of a sudden becomes a little easier with the direct mapping to a data structure. So now that the ease of programming is there all that needs to be considered is the additional memory used and the additional processing cycles to get the data out of the xml string into the DS. Based on those two things I think the feature of loose coupling would still have to be adequately justified being that it is RPG on the other end. On the other hand if you didn't know who was going to be consuming the data queue entry then this becomes a viable solution (maybe Java on the other end?) Going back to passing DS's to data queues...what would be really cool is if the compiler gave us the same ability to "discover" what structure we are dealing with much like the xml parser is doing to return the appropriate data into the right place. Something similar to the Java "instanceof" keyword would rock! Of course that would mean we would have user defined data types (which would rock even more!) Another thing that will be interesting is what the mapping to a DS will look like for the following XML: http://mowyourlawn.com/temp/cXML.xml That is a cXML (http://cxml.org/) derived document. A lot of schemas are developed by domain relationship purists (which isn't always a bad thing - they sure have taught me a lot in the past couple years) and that makes for a VERY flexible schema which inherently means it makes for quite the programming experience because us RPG programmers get to de-normalize all that data to our physical files on the backend :-) Aaron Bartell -----Original Message----- From: rpg400-l-bounces@xxxxxxxxxxxx [mailto:rpg400-l-bounces@xxxxxxxxxxxx] On Behalf Of Larry Ducie Sent: Tuesday, February 07, 2006 6:43 AM To: rpg400-l@xxxxxxxxxxxx Subject: V5R4 native xml processing - a user's critique Hi, I decided to spend a bit of time going through the xml docs in the infocenter over the weekend and I've got to say that it looks very well put together. Here are my thoughts - you are free to contradict me on any point. :-) XML-SAX The use of a single handler procedure is extremely useful. I did wonder how that would work, but passing the event type as the second parameter pretty much solves that one. The nice things about having a single point of contact between the parser and the program are (1) simplicity, and (2) extensibility it will be much easier to add new events without changing the architecture. The use of XML-SAX and a handler procedure would work fine with our current code base (which uses the xml toolkit). We can simply refractor our code and put the calls to our existing event-handling procedures within this new procedure probably in a simple select statement based on event type. There would be some adjustments for handling attributes, but nothing too nasty. Regarding well-formedness checking I cant see any difference in what we do now and what wed need to do with this native parsing method. At the moment we always parse the xml fully without any processing occurring we only register the warning/error/fatal error event handling procedures on the first pass. This is to allow us to validate the document in its entirety. If this straight parse does not return any errors we register the processing event handlers (start of element, process characters, end of element, etc) and parse again. It is during this second parse that we decompose the xml data into database variables. Of course, the handler procedures could be different in each pass. Funnily, this native method allows a nice way of using the same handler procedure for both passes. The first parameter passed to XML-SAX is custom-built and can be anything. As a result, you could have two almost-identical lines of code for both passes, using the same handler procedure in both cases. To inform the handler procedure that it is in validation-only mode you could pass a special value in the first parameter. Within this procedure you could then simply return if the event is not an error event. As no SAX parser would assume well-formedness only programmers - I would always advocate a two-pass approach to xml decomposition if you can not guarantee the quality of the document. XML-SAX provides all I need for production-level code. Id probably do something like: // validate the xml xml-sax %handler(myHandler : *VALIDATE) %xml(/home/xped/file.xml doc=file ccsid=job); // If no exceptions, decompose the xml if not XML_EXCEPTION; xml-sax %handler(myHandler : *DECOMPOSE) %xml(/home/xped/file.xml doc=file ccsid=job); endif; This is very neat and tidy - I like it. XML-INTO Now I had to have a good think about this one. As Im used to having event handler procedures to parse complex xml I wasnt sure how much use this would be. I know XML-INTO can use an event handler procedure but I cant imagine myself using it this way unless I wrote a service program for doing bulk inserts into database tables using arrays and sql. (very nice idea though) For me the obvious use is Many of our processes run asynchronously using NEPs sitting on data queues. The idea is to keep our workflow modular and to allow us to isolate functions within the workflow. This gives us the ability to scale-up or scale-down according to the business needs. These data queue entries are 512 bytes long strings mapped over a data-structure. This data-structure is used extensively by our programs and is used as an entry parameter to many functions. The problem is that some functions need some data and other functions need different data, accommodating the needs of all of the functions was difficult using a fixed-format DS within a data queue entry. We constantly have issues about space within this DS. If we need a new variable we can add it to the DS and recompile the programs. This is undesirable but easily achieved. But if we exceed the 512-byte limit we will need to change all of the data queues to use the longer length. This is a real pain in the butt. Now, this is where XML-INTO will be a dream for us. We could simply create the structure in xml, and only put the populated elements into the xml doc. This would then be passed to the data queue. The receiving NEP could then use xml-into to get the data from the passed xml and place it into the data structure it uses. As we would never use more than 20% of the DS fields in any one application, we would have a lot of room for manoeuvre with this technique. Id probably do something like: xml-into localDS %xml(dqstring :doc=string ccsid=job allowmissing=yes path=datastructurename case=lower); My only question regarding xml-into is this: How deep can we fill a DS with xml data? If we have a DS which has a sub-field which is LIKE another DS, could I expect a 3-level xml doc segment to decompose into the whole structure - including the fields in the nested DS? Does the process handle this kind of structure? Regarding entity references: I assume that, given the following xml: <?xml version=1.0 encoding=ISO-8859-1 ?> <root> <var1>That's life!</var1> <var2>Bodgitt & Scarper electricians</var2> </root> I would expect the parser to call the handler in the following combination of calls: *XML_START_DOCUMENT (no data pointer passed) *XML_VERSION_INFO (data=1.0) *XML_ENCODING_DECL (data=ISO-8859-1) *XML_START_ELEMENT (data=root) *XML_START_ELEMENT (data=var1) *XML_CHARS (data=That) *XML_PREDEF_REF (data=') *XML_CHARS (data=s life) *XML_END_ELEMENT (data=var1) *XML_START_ELEMENT (data=var2) *XML_CHARS (data=Bodgitt ) *XML_PREDEF_REF (data=&) *XML_CHARS (data= Scarper electricians) *XML_END_ELEMENT (data=var2) *XML_END_ELEMENT (data=root) *XML_END_DOCUMENT (no data pointer passed) (Although I would imagine that it is possible that *XML_CHARS may be called several times for the same element if the data is very large.) So it would appear that we will have to resolve escaped entity references within our code. This is not a big thing - and we have been provided the ability to handle custom-references using *XML_UNKNOWN_REF, which is cool. Anyway, this looks like a pretty comprehensive implementation and I can certainly use it in my production code. I would still need the xml toolkit for parsing with xml schemas, but thats not too bad I only use schemas in development and can treat the toolkit as part of my IDE and not part of the production code. Also, the way this code handles attributes is excellent. I never really liked the way standard SAX parsers present attribute data to the call-back procedures at the start of an element even in java it is messy. This way is MUCH nicer. Excellent work Barbara I will certainly have a good play with this once we get the upgrade! Cheers Larry Ducie
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.