Hi Jim,

How are you using Expat? Are you giving it raw binary data, and letting it figure out the encoding? Or are you translating it when reading the IFS, and overriding the Expat parser to a particular encoding?

You reference CCSID 1252... I'm trying to figure out how that fits into the equasion.

Note that CCSID 1252 is definitely not the same as UTF-8. Though, the most basic/commonplace characters in CCSID 1252 are the same as they are in UTF-8.... but, aside from that, they are not the same. UTF-8 (which is CCSID 1208) supports more than million characters in one encoding... Windows Latin-1 (CCSID 1252) supports about 200. So you can imagine that there are many things in UTF-8 that don't exist in 1252.

But, if you're just handing binary data to Expat, it should automatically detect the encoding from the <?xml encoding=?>" line at the top. It doesn't use the CCSID on the file unless you write code to make it do that.

Just make sure the data is actually encoded by the same standard as the <?xml?> says it is... and that you aren't translating during file transfer, or something like that.


On 10/12/2012 10:35 AM, J Franz wrote:
Using Expat parser, and UTF8 file - it failed at line 2,097,268 (so I thought
ccsid 1252 should be good?)
What looks like a blank space is a hex 1D (WRKLNK opt 5-DSPFIL)

Any way to keep parser from throwing up (the ignore opt on CPF9897 ended the
pgm)? Or a method of

verifiying all characters are valid before the parsing?

2nd issue is have several files "too big" for DSPFIL to open - any other
options? Files are in the 2 - 3 gig range
and it is not an option for exporting system to break them down. I could write a
pgm to split them, but would

prefer not to. v6r1

Jim Franz

This thread ...


Return to Archive home page | Return to MIDRANGE.COM home page