× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Using the open-read-close APIs. Which has worked well for many years.
Most customers do include the encoding, but we need to onboard this new one
without them having to make any changes. So we are stuck with XML with no
encoding specified. But as it turns out, we were actually testing with a
modified file, once I got the original back, all was good again.

By the way, in EXPAT_H I think the non-unicode version of constant
XML_ENC_ISO8859_1 is missing the 9...

Thank you!
Rick

On Tue, Jun 18, 2024 at 6:19 PM Scott Klement <midrange-l@xxxxxxxxxxxxxxxx>
wrote:

how is the file being read? You've included info about the CCSID you
used, but no info about how you're reading it (which is what controls
how the CCSID is used.)

Expat does not read files for you -- you have to read it yourself -- so
how you code that is very relevant to your question.

If you specify an encoding on XML_ParserCreate(), I strongly recommend
(1) making sure you are reading the data in a way that translates to the
encoding you specified, and (2) make sure the file is marked with the
correct CCSID.

I say that because for some reason (that I've never understood), it's
very common on IBM i for a someone to transfer a file to the IFS, then
JUST ASSUME that whatever the system assigned as the CCSID is correct.
The system just uses a default without examining the data in the file,
so most of the time, it's not correct. (If it is, it's by sheer luck.)
Instead, take a look at the hex values of variant characters, and
determine which encoding uses those hex values. Then use the CHGATR
command to assign the proper CCSID.

Of course, if you're reading the file as binary and expecting Expat to
figure out the encoding, then the CCSID on the file is irrelevant... but
it'll only work properly if the data is encoded in one of the ways that
Expat can successfully detect. (Which is a tiny subset of the encodings
that IBM i supports.)

Or better yet, switch to JSON and always use UTF-8.

On 6/18/2024 12:56 PM, Rick Rauterkus via MIDRANGE-L wrote:
We have been using Scott Klement's port of the Expat on the i for many
years. But running into an issue with a new customer.

They are sending XML files without the encoding declaration. No big
deal,
they still parse. But they also include some foreign characters, and
those
do not translate well. If I put the encoding of ISO-8859-1 in the file
and
then parse it, those characters do translate correctly.

So I tried to specify the encoding on the call to XML_ParserCreate using
the constant XML_ENC_ISO8859_1. This did not translate correctly either,
but it did translate differently than using no encoding, so it did affect
it somehow.

The CCSID on the IFS files is 1252. I tried 819 also, but that did not
make a difference.

Anybody have any idea how I can get Expat to parse it as ISO-8859-1
without
the encoding specified in the file?

Thanks!
Rick
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.