Hi Jim,

The DTD file has an <?xml?> line at the top of it?? That's strange, because a DTD file is not in XML format. Though, DTD files are not normally used by XML these days, anyway. I wonder if it's really an XSD where someone used the wrong extension? XSD files are XML documents, and it would be quite proper to have <?xml?> in that case.

But, at any rate, Expat isn't reading that DTD file (unless you've added code to make that happen) so the DTD is a moot point.

Your open() call is reading the file in binary mode, and so in the absence of a <?xml encoding="utf-8"?> tag, I'm pretty sure that Expat will assume the file to be in ISO-8859-1.

If you want to force it to consider the file to be UTF-8, you can do that on your call top XML_ParserCreate(), which might be easier than modifying all fo the XML files you receive :-)

myParser = XML_ParserCreate(XML_ENC_UTF8);

This will tell Expat to assume the file is in UTF-8, no matter what the <?xml encoding="xxx"?> says.

Maybe that will help.. the other possibility, of course, is that there's an actual character problem at the position that you mentioned earlier... but I assume you already looked into that?


On 10/12/2012 1:48 PM, J Franz wrote:

The file in the ifs is currently CCSID 1252. It was copied from a non-i (win or
to a win ftp server in zip format, unzipped, then copied to i thru a mapped

The xml itself has no designation in the 1st record: <?xml version="1.0"?> ,
but a separate .dtd file does have this in 1st record:
<?xml version='1.0' encoding='UTF-8' ?>

The open statement in rpgle is:
fd = open( %trim(@filename) : O_RDONLY + O_LARGEFILE );

question: so do I need to change 1st line of xml file to
<?xml version='1.0' encoding='UTF-8' ?>

(btw I'm not auth to see the netserver config if set to xlate..sent the request
up the chain)


From: Scott Klement <midrange-l@xxxxxxxxxxxxxxxx>
To: Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxx>
Sent: Fri, October 12, 2012 2:25:41 PM
Subject: Re: Parse error ... not well formed token

Hi Jim,

How are you using Expat? Are you giving it raw binary data, and
letting it figure out the encoding? Or are you translating it when
reading the IFS, and overriding the Expat parser to a particular encoding?

You reference CCSID 1252... I'm trying to figure out how that fits into
the equasion.

Note that CCSID 1252 is definitely not the same as UTF-8. Though, the
most basic/commonplace characters in CCSID 1252 are the same as they are
in UTF-8.... but, aside from that, they are not the same. UTF-8 (which
is CCSID 1208) supports more than million characters in one encoding...
Windows Latin-1 (CCSID 1252) supports about 200. So you can imagine
that there are many things in UTF-8 that don't exist in 1252.

But, if you're just handing binary data to Expat, it should
automatically detect the encoding from the <?xml encoding=?>" line at
the top. It doesn't use the CCSID on the file unless you write code to
make it do that.

Just make sure the data is actually encoded by the same standard as the
<?xml?> says it is... and that you aren't translating during file
transfer, or something like that.


On 10/12/2012 10:35 AM, J Franz wrote:
Using Expat parser, and UTF8 file - it failed at line 2,097,268 (so I thought
ccsid 1252 should be good?)
What looks like a blank space is a hex 1D (WRKLNK opt 5-DSPFIL)

Any way to keep parser from throwing up (the ignore opt on CPF9897 ended the
pgm)? Or a method of

verifiying all characters are valid before the parsing?

2nd issue is have several files "too big" for DSPFIL to open - any other
options? Files are in the 2 - 3 gig range
and it is not an option for exporting system to break them down. I could write
pgm to split them, but would

prefer not to. v6r1

Jim Franz

This thread ...


Return to Archive home page | Return to MIDRANGE.COM home page