× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Hi Scott

The W3 spec REC-XML I think it's named - it speaks of how XML handles spaces - generally XML is to ignore spaces unless told otherwise - my very bad summary after barely reading up. That's certainly what happens to white space outside of tags and attributes.

There is this xml:space attribute that can be used, but that is no help for me - I am the consumer, not the producer of the XML.

I can't make the producer do anything, they are recalcitrant about lots of things and would charge a lot, in any case - you know, consultants!

So the encoding on my part is an admittedly desperate way to preserve spaces in one element. It's critical there - it's an account ID that actually has the spaces. Other elements do not have this criticality.

As to standard behavior, the fact that the default for the trim option is all, suggests that this is normal behavior - unless Barbara is totally inventing IBM's own way of doing things.

The source for this XML is an export from a Windows app, BTW.

Cheers and back to the office before they miss me!
Vern

On 9/20/2013 3:05 PM, Scott Klement wrote:
Hi Vern,

By the standards, the XML-INTO opcode should NEVER be removing spaces.
That's not the way XML is intended to work... so there's no need to
escape/encode spaces according to the XML standard, and nobody ever
needs to do it.

And, if following the standards, those statements about CDATA are
accurate. CDATA fixes up the stuff that's normally given a special
meaning in XML. Spaces are not one of those things, so there's no need
to give it a special treatment.

However, XML-INTO has this special feature that can be useful to many
RPG programmers... and that's the ability to trim spaces automatically.
This is a feature that I haven't seen anywhere else. So the people
talking about CDATA in those articles probably wouldn't have any reason
to mention it -- they probably don't even know that XML-INTO does this.
In fact, there's a good chance that they've never worked with XML-INTO
at all.

But, anyway, you can turn that feature off with trim=none.

In my opinion, encoding the spaces in your document is wrong behavior.
If you do that, you are forcing people to comply with the specific way
your program was written instead of adhering to the XML standards. If
the spaces are critical to your application, your program should be
written to handle them according to the XML standards, rather than
forcing others to comply with the way your program is written.

Just my opinion.


On 9/20/2013 2:58 PM, Vernon Hamberg wrote:
Yeah, I hear you. But I've seen different interpretations. Here are a
couple statements from a developerworks article - these refer to the
material between the CDATA markers.

"Anything between those bits of markup will pass through the XML parser
untouched."

and

"In either case, the contents of the CDATA section will be available
without modification."

That's kind of what I had thought. It might be a misinterpretation on
the part of the authors, too. The article is at -

http://www.ibm.com/developerworks/library/x-cdata/

On the other hand, the W3 spec on CDATA doesn't mention spaces, it
speaks of using CDATA "...to escape blocks of text containing characters
which would otherwise be recognized as markup."

BTW, this XML is coming TO me from someone else.

There is some attribute called xml:space - but I don't have control of
the markup until I get it.

I chose not to use the trim=none option, because there are also newlines
in a <history> tag - we are bringing this data into a physical file on
our system and a fixed-length text file for the mainframe here. I will
give it a try, as it may not matter if newlines are preserved.

I do think my best choice is to encode these spaces - we do need them.
That is, if trim=none is not working as we want it to.

Thanks, and see you in a couple weeks.

Vern

On 9/20/2013 2:20 PM, Scott Klement wrote:
Vern,

CDATA is normally used so you don't have to escape special characters in
your XML data (such as <, > and & symbols). As far as I know, it has
absolutely nothing to do with blanks.

I have not run a test, but... I would never have thought or guessed in
a million years that CDATA would stop XML-INTO from removing blanks.
That's not what CDATA is intended for, and I've certainly never seen it
used for that.

If you want to prevent XML-INTO from removing blanks, why don't you use
the trim option?

-SK


On 9/20/2013 1:44 PM, Vernon Hamberg wrote:
I have an XML file I'm processing - comes from a "partner" app elsewhere
here.

One of the nodes is our customer number, and it can contain more than
one space, as here -

<custno><![CDATA[008_XY 00020001]]></custno>

We are to expect the CDATA, since we are assuming it should tell the
parser to leave things alone.

Now is that a correct assumption? I did a little digging, and it seems
there is some variation in interpretation.

XML-INTO is what I'm using, with the default for the trim option (to
trim all, including leading and trailing whitespace when there is more
than one space, leaving a single space). I left it this way, because we
also get newlines in the data.

I would like to know if XML-INTO should leave things alone that are in a
CDATA block - that seems to be generally assumed, but I can easily be
mistaken here.

My main option is to encode these particular spaces - sed should do the
trick with a little effort. The alternative is to get the software on
the other end to do the encoding - good luck! And some consultant would
want us to run the PAYMNY command.

Thoughts? Bug? Feature? Options?

Thanks
Vern



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.