Re: XML-INTO, CDATA, and multiple spaces -- RPG400-L

Hi Vern,

By the standards, the XML-INTO opcode should NEVER be removing spaces. That's not the way XML is intended to work... so there's no need to escape/encode spaces according to the XML standard, and nobody ever needs to do it.

And, if following the standards, those statements about CDATA are accurate. CDATA fixes up the stuff that's normally given a special meaning in XML. Spaces are not one of those things, so there's no need to give it a special treatment.

However, XML-INTO has this special feature that can be useful to many RPG programmers... and that's the ability to trim spaces automatically. This is a feature that I haven't seen anywhere else. So the people talking about CDATA in those articles probably wouldn't have any reason to mention it -- they probably don't even know that XML-INTO does this. In fact, there's a good chance that they've never worked with XML-INTO at all.

But, anyway, you can turn that feature off with trim=none.

In my opinion, encoding the spaces in your document is wrong behavior. If you do that, you are forcing people to comply with the specific way your program was written instead of adhering to the XML standards. If the spaces are critical to your application, your program should be written to handle them according to the XML standards, rather than forcing others to comply with the way your program is written.

Just my opinion.

On 9/20/2013 2:58 PM, Vernon Hamberg wrote:

Yeah, I hear you. But I've seen different interpretations. Here are a
couple statements from a developerworks article - these refer to the
material between the CDATA markers.

"Anything between those bits of markup will pass through the XML parser
untouched."

and

"In either case, the contents of the CDATA section will be available
without modification."

That's kind of what I had thought. It might be a misinterpretation on
the part of the authors, too. The article is at -

http://www.ibm.com/developerworks/library/x-cdata/

On the other hand, the W3 spec on CDATA doesn't mention spaces, it
speaks of using CDATA "...to escape blocks of text containing characters
which would otherwise be recognized as markup."

BTW, this XML is coming TO me from someone else.

There is some attribute called xml:space - but I don't have control of
the markup until I get it.

I chose not to use the trim=none option, because there are also newlines
in a <history> tag - we are bringing this data into a physical file on
our system and a fixed-length text file for the mainframe here. I will
give it a try, as it may not matter if newlines are preserved.

I do think my best choice is to encode these spaces - we do need them.
That is, if trim=none is not working as we want it to.

Thanks, and see you in a couple weeks.

Vern

On 9/20/2013 2:20 PM, Scott Klement wrote:

Vern,

CDATA is normally used so you don't have to escape special characters in
your XML data (such as <, > and & symbols). As far as I know, it has
absolutely nothing to do with blanks.

I have not run a test, but... I would never have thought or guessed in
a million years that CDATA would stop XML-INTO from removing blanks.
That's not what CDATA is intended for, and I've certainly never seen it
used for that.

If you want to prevent XML-INTO from removing blanks, why don't you use
the trim option?

-SK

On 9/20/2013 1:44 PM, Vernon Hamberg wrote:

I have an XML file I'm processing - comes from a "partner" app elsewhere
here.

One of the nodes is our customer number, and it can contain more than
one space, as here -

<custno><![CDATA[008_XY 00020001]]></custno>

We are to expect the CDATA, since we are assuming it should tell the
parser to leave things alone.

Now is that a correct assumption? I did a little digging, and it seems
there is some variation in interpretation.

XML-INTO is what I'm using, with the default for the trim option (to
trim all, including leading and trailing whitespace when there is more
than one space, leaving a single space). I left it this way, because we
also get newlines in the data.

I would like to know if XML-INTO should leave things alone that are in a
CDATA block - that seems to be generally assumed, but I can easily be
mistaken here.

My main option is to encode these particular spaces - sed should do the
trick with a little effort. The alternative is to get the software on
the other end to do the encoding - good luck! And some consultant would
want us to run the PAYMNY command.

Thoughts? Bug? Feature? Options?

Thanks
Vern