Re: how to insert hex data into UTF-8 column? -- MIDRANGE-L

× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.

On Sun, Mar 10, 2019 at 6:50 PM Steve Richter <stephenrichter@xxxxxxxxx> wrote:

what I do not follow now is why does UX'00C2' get stored in UTF-8 as
x'C382' ?

The thing you have to understand is that Unicode is NOT an encoding.
There is no "binary representation" of Unicode. Unicode is just a
conceptual list of characters. Every character is given a number. And
it really is supposed to just be a number, NOT some pattern of bits or
bytes. I think it's unfortunate that these Unicode NUMBERS are usually
expressed in hex, because it makes people think that Unicode is an
encoding or a pattern of bits.

These Unicode numbers are called "code points", and the thing you
called UX'00C2' is just the base-16 number C2.

An *encoding* is an actual system of bit patterns used to encode
(represent in concrete fashion) conceptual characters.

UTF-8 is one such encoding system, and under that system, the
*character* at Unicode code point C2 is *represented* (encoded) as the
2-byte pattern x'C382'.

It's laid out pretty well here:

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

John Y.

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.