Re: special characters issue in xml -- RPG400-L

> I'm certainly reading a stream file in EBCDIC format but, how can you tell
> whether a file is in ASCII or EBCDIC or any other format?

One of the really slick features of the iSeries is that you can tag each
file with a CCSID.  If you don't know what a CCSID is, it's a "Coded
Character Set Identifier"..  basically it's a number that identifies a
certain set of rules for identifying the human readable characters in a
file.

On pre-V5R1 systems, stream files were tagged with a codepage instead of a
full CCSID.  (Though, DB2 files had CCSIDs back then already.)  The
difference is that a CCSID identifies a character set, encoding rules, and
potentially two different codepages for a file.  However, in Western
countries where the alphabets are small, a CCSID and a codepage generally
have a 1-to-1 relationship.  In most Asian countries/languages this is
not true however.

ANYWAY, I digress... Your system is also set up with a CCSID, to specify
what the native characters are, and the file is tagged with a CCSID or
codepage that tells what the characters of that file are.   Assuming that
you can trust the person who created your stream files to tag them with
the proper CCSID, you can have the system automatically convert between
them when you read the file.

> I wonder do we really need these many types of formats?

You aren't the only one to feel that way!  The design of Unicode and
similar things was done with the idea that there should be one system that
everyone can use.

Alas, there's too many computer systems and too much software & data using
the traditional codepages for it to be changed so easily.


> If yes, how many known formats do we have in our small world?

Not sure how many exactly, but there are at least a few hundred.  There's
a partial list of CCSIDs and corresponding codepages and character sets
here:
http://publib.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/QB3AWC01/G.2

At any rate, assuming that the stream file that you're reading is tagged
with the correct codepage/ccsid you can open it with the O_TEXTDATA flag
to the open() API.  Then when you read it, it'll automatically convert to
the system's native EBCDIC codepage.

If the stream file has been set with an incorrect CCSID, you can set it to
a different value using the "setccsid" QShell utility:

For example:
   STRQSH CMD('setccsid 819 /path/to/myfile.xml')

More info:
 http://publib.boulder.ibm.com/iseries/v5r2/ic2924/info/rzahz/setccsid.htm

QShell also provides a utility called "iconv" which can be used to convert
a stream file from one CCSID to another.

More info:
 http://publib.boulder.ibm.com/iseries/v5r2/ic2924/info/rzahz/iconv.htm

But, assuming that you are creating these stream files yourself, and
intepreting them in your RPG program, it shouldn't be necessary to use the
QShell utilities.  Just use open() properly.

Info about using open() in RPG can be found on my web site:
   http://www.scottklement.com/rpg/ifs.html

Finally, if all else fails, the iconv() API can be used to convert data in
variables from one CCSID to another. If you search the web for
"ExtProc('iconv')" you should have no trouble finding examples of using
the iconv() API in RPG...

ANYWAY...  you probably didn't need that much detail about CCSIDs... :)

Once you've got the data in your program, just scan for '&' or whatever
and using %replace() or a similar technique to change it to '&amp;'.  Then
write the results to a new stream file...

Searching the web for %scan() and %replace() will give you examples of how
to use those if you're not already familiar with them.   There are also
some pretty good examples of them in the ILE RPG reference manual in the
Information Center.

Err.. "WebSphere Development Studio ILE RPG Reference" is what they call
that manual this week.

Hope that helps (and doesn't leave you too confused)