The file in the ifs is currently CCSID 1252. It was copied from a non-i (win or
to a win ftp server in zip format, unzipped, then copied to i thru a mapped

The xml itself has no designation in the 1st record: <?xml version="1.0"?> ,
but a separate .dtd file does have this in 1st record:
<?xml version='1.0' encoding='UTF-8' ?>

The open statement in rpgle is:
fd = open( %trim(@filename) : O_RDONLY + O_LARGEFILE );

question: so do I need to change 1st line of xml file to
<?xml version='1.0' encoding='UTF-8' ?>
(btw I'm not auth to see the netserver config if set to xlate..sent the request
up the chain)

From: Scott Klement <midrange-l@xxxxxxxxxxxxxxxx>
To: Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxx>
Sent: Fri, October 12, 2012 2:25:41 PM
Subject: Re: Parse error ... not well formed token

Hi Jim,

How are you using Expat?  Are you giving it raw binary data, and
letting it figure out the encoding?  Or are you translating it when
reading the IFS, and overriding the Expat parser to a particular encoding?

You reference CCSID 1252... I'm trying to figure out how that fits into
the equasion.

Note that CCSID 1252 is definitely not the same as UTF-8.  Though, the
most basic/commonplace characters in CCSID 1252 are the same as they are
in UTF-8....  but, aside from that, they are not the same.  UTF-8 (which
is CCSID 1208) supports more than  million characters in one encoding...
  Windows Latin-1  (CCSID 1252) supports about 200.  So you can imagine
that there are many things in UTF-8 that don't exist in 1252.

But, if you're just handing binary data to Expat, it should
automatically detect the encoding from the <?xml encoding=?>" line at
the top.  It doesn't use the CCSID on the file unless you write code to
make it do that.

Just make sure the data is actually encoded by the same standard as the
<?xml?> says it is...  and that you aren't translating during file
transfer, or something like that.


On 10/12/2012 10:35 AM, J Franz wrote:
Using Expat parser, and UTF8 file - it failed at line 2,097,268 (so I thought
ccsid 1252 should be good?)
What looks like a blank space is a hex  1D  (WRKLNK opt 5-DSPFIL)

Any way to keep parser from throwing up (the ignore opt on CPF9897 ended the
pgm)? Or a method of

verifiying all characters are valid before the parsing?

2nd issue is have several files "too big" for DSPFIL to open - any other
options? Files are in the 2 - 3 gig range
and it is not an option for exporting system to break them down. I could write
pgm to split them, but would

prefer not to. v6r1

Jim Franz

This thread ...


Return to Archive home page | Return to MIDRANGE.COM home page