RE: Replace 1 Char with 3Char's -- RPG400-L

The problem with just replacing all <'s is that the method
inserts not only CRLF between nodes but also IN textnodes
so you will have altered node content and not all parsers
likes that

CRLF<name>abcdefgCRLF</name>

Most parsers does however reconize <name>CRLF</name> because
it treat them as a parent nodes and parent nodes has per
definition no content.

Originally the XML came in by a web service/Scoots HTTPAPI,
does it has a %addr and a %len on the input ?

"Dennis Lovelady" <iseries@xxxxxxxxxxxx>
Sent by: rpg400-l-bounces@xxxxxxxxxxxx
15-02-2010 17:09
Please respond to
RPG programming on the IBM i / System i <rpg400-l@xxxxxxxxxxxx>

To
"'RPG programming on the IBM i / System i'" <rpg400-l@xxxxxxxxxxxx>
cc

Subject
RE: Replace 1 Char with 3Char's

For my edification, please give an explanation of the various
components
of the statement. I have some idea of the % being a delimiter, but it'd
help to have it explained. I know, it's in the man pages!! And even the
QShell manual. But it's early and I'm on my way to work - and a little
lazy!!

I see redirection here, at least.

Sure. I'll try. I'm not as good at explaining things as Scott, though, so
please bear with me.

's%[<]%\r\n<%g'

The first character, s, means substitute, and the substitute syntax is:
s<delim>find_this<delim>change_to_this<delim>

The replacement character can be absolutely any character. It's best to
use
a replacement character that's not in your find_this and change_to_this
strings. (You can, but you'd then have to "escape" them.) I like
visually
obvious characters like / or % or #, et cetera, and I'll freely switch
between them. The one rule is that the delimiter character must appear
exactly three times. It could be a space for example, but that might warp
the debugger/maintainer's mind.

The last character, g, when following substitute, means globally (or
replace
all). Without it, only one replacement per line will be done.

The brackets around our find_this value --- [<] --- indicate a set of
characters that are each treated alike. For example, to find any digit,
you
could use [0123456789], though there are better ways outside the scope of
this discussion. My usage here is actually a bit of a trick that I employ
frequently. Before I go on, let me digress:

If you go to http://www.regular-expressions.info/characters.html you will
see that there are many "control" characters in regular expressions, like
the period . backwacky \, brackets [], parentheses (), curly braces {}
and
so on. These control characters take on special meaning, and if you want
them treated as just another character, then you have to "escape" them,
which means putting a backwacky before them. Such as: if you want a
period
character to just be a period rather than a wildcard, you can use \. to
indicate it. That can get ugly quickly. Another way to treat these
characters as their "native" values is to put them into a one-character
set,
like [.] which, to me, is easier to read and surer to understand.

So with [<] I don't have to worry about whether < is a special character
or
not, I know that sed will see it as its native value. (On some occasions,
a
problem can be caused by escaping something that doesn't need to be
escaped.
More reason to go with this syntax unless you (and those who follow you)
know your control characters very well.)

So, we have s%[<]%something%g indicating that I want to find any < and
replace every occurrence with something. But "something" looks like this:
\r\n< Now what?

The rest is pretty easy if you understand some of the escaped special
characters. \r is return (or CR to most folks). \n is new line (or LF).
The substitution string will always be treated at face value, so I can
place
< here without worrying about whether it's a special character.

Voila. Replace < with CR LF < (no spaces), globally.

You mentioned that you already understand the redirection. But for those
who don't, the '< infile' says: get my data from infile. '> outfile'
says:
place the result into outfile.

Note that the OP had some problem with that exact string, and the reason,
I'm convinced, was CCSID-related. Maybe the backwacky didn't get to him
properly, or maybe the brackets or something else. But I'm sure if the
information at http://www.regular-expressions.info/characters.html had
been
well understood, this would not have been an issue, and the very short sed
command would have done the trick.

Dennis Lovelady
http://www.linkedin.com/in/dennislovelady
--
Courage is what it takes to stand up and speak.Courage is also what it
takes
to sit down and listen.
-- unknown