×
The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.
On Mon, Mar 29, 2021 at 8:05 AM Patrik Schindler <poc@xxxxxxxxxx> wrote:
The issue here is more complex because this setting does not apply to DBCS content, and UTF-8 is DBCS by nature.
I don't know when "MBCS" was introduced, so maybe you are speaking
from a context where SBCS and DBCS are the only possible choices. But
UTF-8 is definitely a MBCS encoding. The part of UTF-8 that overlaps
with ASCII are one-byte characters, but any given character[1] can use
up to 4 bytes to encode in UTF-8.
I'll be the first to admit I've got a long way to go to wrap my head around CCSID though.
Welcome to my world. :-)
CCSID is a special IBM construct that is a little bit more complicated
than the notion of "encoding" that the rest of the world uses. But I
would say a good first step (and for many, the only step that's really
needed) is to pretend that CCSIDs really are "just" encodings in the
same sense that the rest of the world uses, and then learn about those
encodings. And yet again, I will point to my favorite article on the
topic:
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
Or, if that link is too long and gets mangled, try this:
https://bit.ly/3cx7WoR
Some programming languages, like Python 3, really help bring home the
basic understanding of Unicode described in that article.
John Y.
[1]More precisely, not a "character" but a "code point". The idea of a
"character" is hard to pin down, but for most people and most
purposes, it is close enough to use those terms interchangeably.
As an Amazon Associate we earn from qualifying purchases.