|
I'm in no way an expert on this stuff, Brad, so take it with a grain of salt, but in general UCS-2 values require two bytes, and in those two bytes DBCS data gets converted to code points that don't clash with standard single-byte data. For the most part, standard ASCII is mapped to 00xx, where xx is the original ASCII hex code. DBCS data, on the other hand, gets converted to values > 256. Thus you can't have collisions. Note that UCS-2 is ALWAYS 16 bits per character. Meanwhile, UTF-8 and UTF-16, the other popular Unicode format, can contain characters of up to four bytes which allow for over a million characters. UTF is way confusing for me, since it also allows for special "combining characters" (a way of specifying, for example, an umlaut along with a U). Joe
From: Brad Stone In a nutshell, it sounds like this may convert all characters, either SCBS or DBCS to unique values so that there is no way a DBCS value can contain a byte that is equal to a SBCS value and this problem would be eliminated?
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.