|
1 - If you are required to store Asian characters, what CCSID do you use?
2 - Are there IBM i server-side advantages to using UTF-16? (i.e. is textUTF-16
searching quicker for Asian languages because they have two bytes in
vs. 3 in UTF-8?)
My team has been looking at the advantages of using UTF-8 vs. UTF-16 if the
majority of the data you store is double-byte in nature. A number of
advantages and disadvantages are listed at this
link<http://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16>:
http://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16
One thing that stuck out to us on that link was this:
*Characters U+0800 through U+FFFF use three bytes in UTF-8, but only two in
UTF-16. As a result, text in (for example) Chinese, Japanese or Hindi could
take more space in UTF-8 if there are more of these characters than there
are ASCII characters. This happens for pure text[34] but rarely for HTML
documents or documents in XML based formats such as .docx or .odt. For
example, both the Japanese UTF-8 and the Hindi Unicode articles on
Wikipedia take more space in UTF-16 than in UTF-8.*
To sum it up, many (most?) Asian characters will be 3 to 4 bytes in UTF-8
and only 2 bytes in UTF-16. The majority of a webpage is markup, not
actual information from your database or text literals. Many (most?) new
applications convey data to the browser vs. staying on only on the machine.
*I now have 3 questions:*
1 - If you are required to store Asian characters, what CCSID do you use?
2 - Are there IBM i server-side advantages to using UTF-16? (i.e. is text
searching quicker for Asian languages because they have two bytes in UTF-16
vs. 3 in UTF-8?)
3 - If you've used both UTF-8 and UTF-16, have you found one to be more
advantageous than the other?
Aaron Bartell
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.