Re: Problem copying PC document to DDS file with accentuated characters -- MIDRANGE-L

On Mon, Mar 29, 2021 at 8:05 AM Patrik Schindler <poc@xxxxxxxxxx> wrote:

The issue here is more complex because this setting does not apply to DBCS content, and UTF-8 is DBCS by nature.

I don't know when "MBCS" was introduced, so maybe you are speaking
from a context where SBCS and DBCS are the only possible choices. But
UTF-8 is definitely a MBCS encoding. The part of UTF-8 that overlaps
with ASCII are one-byte characters, but any given character[1] can use
up to 4 bytes to encode in UTF-8.

I'll be the first to admit I've got a long way to go to wrap my head around CCSID though.

Welcome to my world. :-)

CCSID is a special IBM construct that is a little bit more complicated
than the notion of "encoding" that the rest of the world uses. But I
would say a good first step (and for many, the only step that's really
needed) is to pretend that CCSIDs really are "just" encodings in the
same sense that the rest of the world uses, and then learn about those
encodings. And yet again, I will point to my favorite article on the
topic:

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Or, if that link is too long and gets mangled, try this:

https://bit.ly/3cx7WoR

Some programming languages, like Python 3, really help bring home the
basic understanding of Unicode described in that article.

John Y.

[1]More precisely, not a "character" but a "code point". The idea of a
"character" is hard to pin down, but for most people and most
purposes, it is close enough to use those terms interchangeably.

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.