× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Good evaluation!

Here are my comments.

It would be interesting to run some parser performance numbers to see how
much time is spent going through a document strictly for the point of
well-formedness. Sure small 5Kb files will parse fast, but what happens when
it gets to 1MB or larger (which is common when you start doing XML to
replace existing EDI).


Passing XML between RPG programs is interesting because it gives RPG
programmers a way to natively keep their code loosely coupled. Doing this
before the bif's I wouldn't have even considered it because of the
programming overhead, but now it all of a sudden becomes a little easier
with the direct mapping to a data structure.  So now that the ease of
programming is there all that needs to be considered is the additional
memory used and the additional processing cycles to get the data out of the
xml string into the DS.  Based on those two things I think the feature of
loose coupling would still have to be adequately justified being that it is
RPG on the other end.  On the other hand if you didn't know who was going to
be consuming the data queue entry then this becomes a viable solution (maybe
Java on the other end?) 

Going back to passing DS's to data queues...what would be really cool is if
the compiler gave us the same ability to "discover" what structure we are
dealing with much like the xml parser is doing to return the appropriate
data into the right place.  Something similar to the Java "instanceof"
keyword would rock!  Of course that would mean we would have user defined
data types (which would rock even more!) 



Another thing that will be interesting is what the mapping to a DS will look
like for the following XML:  

http://mowyourlawn.com/temp/cXML.xml

That is a cXML (http://cxml.org/) derived document. A lot of schemas are
developed by domain relationship purists (which isn't always a bad thing -
they sure have taught me a lot in the past couple years) and that makes for
a VERY flexible schema which inherently means it makes for quite the
programming experience because us RPG programmers get to de-normalize all
that data to our physical files on the backend :-)

Aaron Bartell


-----Original Message-----
From: rpg400-l-bounces@xxxxxxxxxxxx [mailto:rpg400-l-bounces@xxxxxxxxxxxx]
On Behalf Of Larry Ducie
Sent: Tuesday, February 07, 2006 6:43 AM
To: rpg400-l@xxxxxxxxxxxx
Subject: V5R4 native xml processing - a user's critique

Hi,

I decided to spend a bit of time going through the xml docs in the
infocenter over the weekend and I've got to say that it looks very well put
together. Here are my thoughts - you are free to contradict me on any point.

:-)


XML-SAX

The use of a single handler procedure is extremely useful. I did wonder how
that would work, but passing the event type as the second parameter pretty
much solves that one. The nice things about having a single point of contact
between the parser and the program are (1) simplicity, and (2) extensibility
 it will be much easier to add new events without changing the
architecture.

The use of XML-SAX and a handler procedure would work fine with our current
code base (which uses the xml toolkit). We can simply refractor our code and
put the calls to our existing event-handling procedures within this new
procedure  probably in a simple select statement based on event type. There
would be some adjustments for handling attributes, but nothing too nasty.

Regarding well-formedness checking  I cant see any difference in what we
do now and what wed need to do with this native parsing method. At the
moment we always parse the xml fully without any processing occurring  we
only register the warning/error/fatal error event handling procedures on the
first pass. This is to allow us to validate the document in its entirety. If
this straight parse does not return any errors we register the processing
event handlers (start of element, process characters, end of element, etc)
and parse again. It is during this second parse that we decompose the xml
data into database variables. Of course, the handler procedures could be
different in each pass. Funnily, this native method allows a nice way of
using the same handler procedure for both passes. The first parameter passed
to XML-SAX is custom-built and can be anything. As a result, you could have
two almost-identical lines of code for both passes, using the same handler
procedure in both cases. To inform the handler procedure that it is in
validation-only mode you could pass a special value in the first parameter. 
Within this procedure you could then simply return if the event is not an
error event. As no SAX parser would assume well-formedness  only
programmers - I would always advocate a two-pass approach to xml
decomposition if you can not guarantee the quality of the document. XML-SAX
provides all I need for production-level code.

Id probably do something like:

// validate the xml
xml-sax %handler(myHandler : *VALIDATE) %xml(/home/xped/file.xml
doc=file ccsid=job);

// If no exceptions, decompose the xml
if not XML_EXCEPTION;
  xml-sax %handler(myHandler : *DECOMPOSE) %xml(/home/xped/file.xml
doc=file ccsid=job); endif;

This is very neat and tidy - I like it.


XML-INTO

Now I had to have a good think about this one. As Im used to having event
handler procedures to parse complex xml I wasnt sure how much use this
would be. I know XML-INTO can use an event handler procedure but I cant
imagine myself using it this way unless I wrote a service program for doing
bulk inserts into database tables using arrays and sql. (very nice idea
though)

For me the obvious use is

Many of our processes run asynchronously using NEPs sitting on data queues. 
The idea is to keep our workflow modular and to allow us to isolate
functions within the workflow. This gives us the ability to scale-up or
scale-down according to the business needs. These data queue entries are 512
bytes long strings mapped over a data-structure. This data-structure is used
extensively by our programs  and is used as an entry parameter to many
functions. The problem is that some functions need some data and other
functions need different data, accommodating the needs of all of the
functions was difficult using a fixed-format DS within a data queue entry.  
We constantly have issues about space within this DS. If we need a new
variable we can add it to the DS and recompile the programs. This is
undesirable but easily achieved. But if we exceed the 512-byte limit we will
need to change all of the data queues to use the longer length. This is a
real pain in the butt. Now, this is where XML-INTO will be a dream for us. 
We could simply create the structure in xml, and only put the populated
elements into the xml doc. This would then be passed to the data queue. The
receiving NEP could then use xml-into to get the data from the passed xml
and place it into the data structure it uses. As we would never use more
than 20% of the DS fields in any one application, we would have a lot of
room for manoeuvre with this technique.

Id probably do something like:

xml-into localDS %xml(dqstring :doc=string ccsid=job allowmissing=yes
path=datastructurename case=lower);

My only question regarding xml-into is this: How deep can we fill a DS with
xml data? If we have a DS which has a sub-field which is LIKE another DS,
could I expect a 3-level xml doc segment to decompose into the whole
structure - including the fields in the nested DS? Does the process handle
this kind of structure?


Regarding entity references: I assume that, given the following xml:

<?xml version=1.0 encoding=ISO-8859-1 ?> <root> <var1>That&apos;s
life!</var1> <var2>Bodgitt &amp; Scarper electricians</var2> </root>

I would expect the parser to call the handler in the following combination
of calls:

*XML_START_DOCUMENT (no data pointer passed) *XML_VERSION_INFO (data=1.0)
*XML_ENCODING_DECL (data=ISO-8859-1) *XML_START_ELEMENT (data=root)
*XML_START_ELEMENT (data=var1) *XML_CHARS (data=That) *XML_PREDEF_REF
(data=&apos;) *XML_CHARS (data=s life) *XML_END_ELEMENT (data=var1)
*XML_START_ELEMENT (data=var2) *XML_CHARS (data=Bodgitt )
*XML_PREDEF_REF (data=&amp;) *XML_CHARS (data= Scarper electricians)
*XML_END_ELEMENT (data=var2) *XML_END_ELEMENT (data=root)
*XML_END_DOCUMENT (no data pointer passed)

(Although I would imagine that it is possible that *XML_CHARS may be called
several times for the same element if the data is very large.)

So it would appear that we will have to resolve escaped entity references
within our code. This is not a big thing - and we have been provided the
ability to handle custom-references using *XML_UNKNOWN_REF, which is cool.


Anyway, this looks like a pretty comprehensive implementation and I can
certainly use it in my production code. I would still need the xml toolkit
for parsing with xml schemas, but thats not too bad  I only use schemas in
development and can treat the toolkit as part of my IDE and not part of the
production code. Also, the way this code handles attributes is excellent. I
never really liked the way standard SAX parsers present attribute data to
the call-back procedures at the start of an element  even in java it is
messy. This way is MUCH nicer.

Excellent work Barbara  I will certainly have a good play with this once we
get the upgrade!

Cheers

Larry Ducie





As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.