× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



I'm in no way an expert on this stuff, Brad, so take it with a grain of
salt, but in general UCS-2 values require two bytes, and in those two bytes
DBCS data gets converted to code points that don't clash with standard
single-byte data.

For the most part, standard ASCII is mapped to 00xx, where xx is the
original ASCII hex code.  DBCS data, on the other hand, gets converted to
values > 256.  Thus you can't have collisions.

Note that UCS-2 is ALWAYS 16 bits per character.  Meanwhile, UTF-8 and
UTF-16, the other popular Unicode format, can contain characters of up to
four bytes which allow for over a million characters.  UTF is way confusing
for me, since it also allows for special "combining characters" (a way of
specifying, for example, an umlaut along with a U).

Joe

From: Brad Stone

In a nutshell, it sounds like this may convert all
characters, either SCBS or DBCS to unique values so that
there is no way a DBCS value can contain a byte that is
equal to a SBCS value and this problem would be eliminated?


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.