|
EBCDIC was on its way out.
On Wed, Oct 12, 2016 at 6:33 PM, Kevin Adler <kadler@xxxxxxxxxx> wrote:
While UTF-32 can encode all 1+ million Unicode code point in one codeunit
(4 bytes), you have to be careful not to conflate a Unicode code point
with a "character." A character (in the abstract sense) may be made up of
multiple Unicode code points, which may further be encoded in multiple
code units.
There comes a point when further details are not very productive. If
you give people too much information at once, they can't absorb the
key points. We're at a stage where not enough people understand
Unicode even to a first approximation. If we can get to a place where
a critical mass of programmers subscribes to the misconception that a
Unicode code point is tantamount to a conceptual character, that will
already be significant progress from where we are now. And when we're
there, THEN we can refine the picture. To be quite frank, right now
most people are simply not ready.
Joel Spolsky tends to do a good job of finding a balance between
accessibility and rigorousness, ruthlessly simplifying (even
oversimplifying) when it's more important to drive home the first
approximation than to be absolutely correct. I highly recommend his
primer on Unicode:
http://www.joelonsoftware.com/articles/Unicode.html
John Y.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.