How are you using Expat? Are you giving it raw binary data, and
letting it figure out the encoding? Or are you translating it when
reading the IFS, and overriding the Expat parser to a particular encoding?
You reference CCSID 1252... I'm trying to figure out how that fits into
Note that CCSID 1252 is definitely not the same as UTF-8. Though, the
most basic/commonplace characters in CCSID 1252 are the same as they are
in UTF-8.... but, aside from that, they are not the same. UTF-8 (which
is CCSID 1208) supports more than million characters in one encoding...
Windows Latin-1 (CCSID 1252) supports about 200. So you can imagine
that there are many things in UTF-8 that don't exist in 1252.
But, if you're just handing binary data to Expat, it should
automatically detect the encoding from the <?xml encoding=?>" line at
the top. It doesn't use the CCSID on the file unless you write code to
make it do that.
Just make sure the data is actually encoded by the same standard as the
<?xml?> says it is... and that you aren't translating during file
transfer, or something like that.
On 10/12/2012 10:35 AM, J Franz wrote:
Using Expat parser, and UTF8 file - it failed at line 2,097,268 (so I thought
ccsid 1252 should be good?)
What looks like a blank space is a hex 1D (WRKLNK opt 5-DSPFIL)
Any way to keep parser from throwing up (the ignore opt on CPF9897 ended the
pgm)? Or a method of
verifiying all characters are valid before the parsing?
2nd issue is have several files "too big" for DSPFIL to open - any other
options? Files are in the 2 - 3 gig range
and it is not an option for exporting system to break them down. I could write a
pgm to split them, but would
prefer not to. v6r1