× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



From the wiki article (http://en.wikipedia.org/wiki/UTF-16/UCS-2) we see that UTF-16 is not composed strictly of 16 bit chars:

In computing, UTF-16 (16-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps each character to a sequence of 16-bit words. Characters are known as code points and the 16-bit words are known as code units. For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words, together called a surrogate pair. All possible code points from U+0000 through U+10FFFF, except for the surrogate code points U+D800-U+DFFF (which are not characters), are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use.

The java String object takes care of all that for you, so there's not much to learn to work with those characters that are outside the "Basic Multilingual Plane".

-----Original Message-----
From: java400-l-bounces@xxxxxxxxxxxx [mailto:java400-l-bounces@xxxxxxxxxxxx] On Behalf Of Thorbjoern Ravn Andersen
Sent: Wednesday, November 18, 2009 9:39 AM
To: Java Programming on and around the iSeries / AS400
Subject: Re: Running in EBCDIC or ASCII

Barbara Morris skrev:
McKown, John wrote:

All JAVA (JVM) implementations must run UTF-16 internally to be called JAVA. This is a SUN requirement.



But there are cases where Java has to deal with non-UTF-16 data, for
example when converting between a String and a byte array. The
file.encoding Java property controls how the byte array is interpreted
by Java.

A String contain char's, not bytes, which is used e.g. in raw file or network transports. Conversion between char and byte arrays take the encoding in consideration. If no encoding is specified the platform default is used.

A char is 16 bits but that is not enough for more complex characters, but I have been fortunate not to have to learn how to deal with it in Java... Yet :D


--
Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

--
This is the Java Programming on and around the iSeries / AS400 (JAVA400-L) mailing list To post a message email: JAVA400-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/java400-l
or email: JAVA400-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/java400-l.




As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.