× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



My team has been looking at the advantages of using UTF-8 vs. UTF-16 if the
majority of the data you store is double-byte in nature. A number of
advantages and disadvantages are listed at this
link<http://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16>:
http://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16

One thing that stuck out to us on that link was this:

*Characters U+0800 through U+FFFF use three bytes in UTF-8, but only two in
UTF-16. As a result, text in (for example) Chinese, Japanese or Hindi could
take more space in UTF-8 if there are more of these characters than there
are ASCII characters. This happens for pure text[34] but rarely for HTML
documents or documents in XML based formats such as .docx or .odt. For
example, both the Japanese UTF-8 and the Hindi Unicode articles on
Wikipedia take more space in UTF-16 than in UTF-8.*

To sum it up, many (most?) Asian characters will be 3 to 4 bytes in UTF-8
and only 2 bytes in UTF-16. The majority of a webpage is markup, not
actual information from your database or text literals. Many (most?) new
applications convey data to the browser vs. staying on only on the machine.

*I now have 3 questions:*

1 - If you are required to store Asian characters, what CCSID do you use?

2 - Are there IBM i server-side advantages to using UTF-16? (i.e. is text
searching quicker for Asian languages because they have two bytes in UTF-16
vs. 3 in UTF-8?)

3 - If you've used both UTF-8 and UTF-16, have you found one to be more
advantageous than the other?

Aaron Bartell

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.