From the wiki article (http://en.wikipedia.org/wiki/UTF-16/UCS-2) we see that UTF-16 is not composed strictly of 16 bit chars:
In computing, UTF-16 (16-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps each character to a sequence of 16-bit words. Characters are known as code points and the 16-bit words are known as code units. For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words, together called a surrogate pair. All possible code points from U+0000 through U+10FFFF, except for the surrogate code points U+D800-U+DFFF (which are not characters), are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use.
The java String object takes care of all that for you, so there's not much to learn to work with those characters that are outside the "Basic Multilingual Plane".
From: java400-l-bounces@xxxxxxxxxxxx [mailto:java400-l-bounces@xxxxxxxxxxxx] On Behalf Of Thorbjoern Ravn Andersen
Sent: Wednesday, November 18, 2009 9:39 AM
To: Java Programming on and around the iSeries / AS400
Subject: Re: Running in EBCDIC or ASCII
Barbara Morris skrev:
McKown, John wrote:
All JAVA (JVM) implementations must run UTF-16 internally to be called JAVA. This is a SUN requirement.
But there are cases where Java has to deal with non-UTF-16 data, for
example when converting between a String and a byte array. The
file.encoding Java property controls how the byte array is interpreted
A String contain char's, not bytes, which is used e.g. in raw file or network transports. Conversion between char and byte arrays take the encoding in consideration. If no encoding is specified the platform default is used.
A char is 16 bits but that is not enough for more complex characters, but I have been fortunate not to have to learn how to deal with it in Java... Yet :D
Thorbjørn Ravn Andersen "...plus... Tubular Bells!"
This is the Java Programming on and around the iSeries / AS400 (JAVA400-L) mailing list To post a message email: JAVA400-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
or email: JAVA400-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/java400-l
As an Amazon Associate we earn from qualifying purchases.