× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



> From: Joe Sam Shirah
> 
> The normal way to convert bytes in one encoding to a String in
> Unicode is:
> 
> String S1 = new String( byte[] bytes, String charsetName );

Actually, I've never used this constructor before.  This might not be a
bad time to try it. I'm pretty new at the entire DBCS concept, so this
is actually pretty interesting stuff.

> use String.getBytes( String charsetName ) to get bytes
> in the specified encoding.

This is very cool, too.  I'll return to it in a moment.

>     Here's where I'm puzzled.  The data you posted is not a Java byte
> array.
> The reason is that Java bytes range in value from -128 to +127; you
have a
> value of F9 ( 249 ), so that can't work.

Okay, I didn't actually take the time to do the exact syntax.  In fact,
the string (x'0e') means nothing in Java.  It should be 0x0e.  So, as I
said, I wasn't shooting for precise syntax, just general idea.  In
reality, for bytes between 128 and 255, you need the following:

  byte b = (byte) 0xa5;
 
That works just fine.


>     The second part is, and my understanding of the encoding
algorithms is
> limited here, so please correct:  If we look at your data as non-Java
> bytes at 8 bits that can go to a value of 255, then it appears to me
to
> amount to four double byte characters.  I don't understand how you
could
> get to only three characters in Shift-JIS.

Ahh!  Welcome to the exciting world of DBCS coding.  CP5026 is actually
technically not a double-byte character set like Unicode.  Instead, it
is a "multibyte" encoding, where some characters take only one byte and
others take two.  In order to switch between single-byte and double-byte
characters, EBCDIC uses a "shift" code.  0x0e is the "shift-in" code,
which changes from single-byte to double-byte mode, while 0x0f is
"shift-out".

So, in my case, the three double-byte characters are x'45a0', x'479a'
and x'45f9'.  The x'0e' at the beginning and the x'0f' at the end are
simply the shift-in/shift-out codes that are used to bracket DBCS data.

Joe


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.