×

Good News Everybody!

The new search engine is LIVE!

Please report any problems to david (at) midrange.com.




I'm in no way an expert on this stuff, Brad, so take it with a grain of
salt, but in general UCS-2 values require two bytes, and in those two bytes
DBCS data gets converted to code points that don't clash with standard
single-byte data.

For the most part, standard ASCII is mapped to 00xx, where xx is the
original ASCII hex code.  DBCS data, on the other hand, gets converted to
values > 256.  Thus you can't have collisions.

Note that UCS-2 is ALWAYS 16 bits per character.  Meanwhile, UTF-8 and
UTF-16, the other popular Unicode format, can contain characters of up to
four bytes which allow for over a million characters.  UTF is way confusing
for me, since it also allows for special "combining characters" (a way of
specifying, for example, an umlaut along with a U).

Joe

From: Brad Stone

In a nutshell, it sounds like this may convert all
characters, either SCBS or DBCS to unique values so that
there is no way a DBCS value can contain a byte that is
equal to a SBCS value and this problem would be eliminated?


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2026 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.