Re: Problem copying PC document to DDS file with accentuated characters -- MIDRANGE-L

Hello John,

Am 29.03.2021 um 15:44 schrieb John Yeung <gallium.arsenide@xxxxxxxxx>:

I don't know when "MBCS" was introduced, so maybe you are speaking from a context where SBCS and DBCS are the only possible choices. But UTF-8 is definitely a MBCS encoding. The part of UTF-8 that overlaps with ASCII are one-byte characters, but any given character[1] can use up to 4 bytes to encode in UTF-8.

You're perfectly right, in a generic way. But I'm cautious to assume anything in case of encodings and IBM i. First, because the original Request was regarding french accent characters, and I don't know if these are in the SB or MB range.

CCSID is a special IBM construct that is a little bit more complicated than the notion of "encoding" that the rest of the world uses.

I still have the "Speak the right language…" PDF from IBM on my reading list. :-)

But I would say a good first step (and for many, the only step that's really needed) is to pretend that CCSIDs really are "just" encodings in the same sense that the rest of the world uses, and then learn about those encodings. And yet again, I will point to my favorite article on the topic:

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Thanks! I've put it on my reading list.

I know about iconv and like BBEdit on my Mac to most often guess the encoding right. Vim does alike. In theory, this ambiguity should be resolved by IBM i saving the "encoding" as Metadata per file. But of course, it must be correct also. And I assume that the culprit from the OP is there.

:wq! PoC

This mailing list archive is Copyright 1997-2026 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.