×
The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.
On Sun, Mar 10, 2019 at 6:50 PM Steve Richter <stephenrichter@xxxxxxxxx> wrote:
what I do not follow now is why does UX'00C2' get stored in UTF-8 as
x'C382' ?
The thing you have to understand is that Unicode is NOT an encoding.
There is no "binary representation" of Unicode. Unicode is just a
conceptual list of characters. Every character is given a number. And
it really is supposed to just be a number, NOT some pattern of bits or
bytes. I think it's unfortunate that these Unicode NUMBERS are usually
expressed in hex, because it makes people think that Unicode is an
encoding or a pattern of bits.
These Unicode numbers are called "code points", and the thing you
called UX'00C2' is just the base-16 number C2.
An *encoding* is an actual system of bit patterns used to encode
(represent in concrete fashion) conceptual characters.
UTF-8 is one such encoding system, and under that system, the
*character* at Unicode code point C2 is *represented* (encoded) as the
2-byte pattern x'C382'.
It's laid out pretty well here:
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
John Y.
As an Amazon Associate we earn from qualifying purchases.