× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



how is the file being read?    You've included info about the CCSID you used, but no info about how you're reading it (which is what controls how the CCSID is used.)

Expat does not read files for you -- you have to read it yourself -- so how you code that is very relevant to your question.

If you specify an encoding on XML_ParserCreate(), I strongly recommend (1) making sure you are reading the data in a way that translates to the encoding you specified, and (2) make sure the file is marked with the correct CCSID.

I say that because for some reason (that I've never understood), it's very common on IBM i for a someone to transfer a file to the IFS, then JUST ASSUME that whatever the system assigned as the CCSID is correct.  The system just uses a  default without examining the data in the file, so most of the time, it's not correct.  (If it is, it's by sheer luck.)  Instead, take a look at the hex values of variant characters, and determine which encoding uses those hex values.  Then use the CHGATR command to assign the proper CCSID.

Of course, if you're reading the file as binary and expecting Expat to figure out the encoding, then the CCSID on the file is irrelevant... but it'll only work properly if the data is encoded in one of the ways that Expat can successfully detect.  (Which is a tiny subset of the encodings that IBM i supports.)

Or better yet, switch to JSON and always use UTF-8.

On 6/18/2024 12:56 PM, Rick Rauterkus via MIDRANGE-L wrote:
We have been using Scott Klement's port of the Expat on the i for many
years. But running into an issue with a new customer.

They are sending XML files without the encoding declaration. No big deal,
they still parse. But they also include some foreign characters, and those
do not translate well. If I put the encoding of ISO-8859-1 in the file and
then parse it, those characters do translate correctly.

So I tried to specify the encoding on the call to XML_ParserCreate using
the constant XML_ENC_ISO8859_1. This did not translate correctly either,
but it did translate differently than using no encoding, so it did affect
it somehow.

The CCSID on the IFS files is 1252. I tried 819 also, but that did not
make a difference.

Anybody have any idea how I can get Expat to parse it as ISO-8859-1 without
the encoding specified in the file?

Thanks!
Rick

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.