|
> From: Joe Sam Shirah > > The normal way to convert bytes in one encoding to a String in > Unicode is: > > String S1 = new String( byte[] bytes, String charsetName ); Actually, I've never used this constructor before. This might not be a bad time to try it. I'm pretty new at the entire DBCS concept, so this is actually pretty interesting stuff. > use String.getBytes( String charsetName ) to get bytes > in the specified encoding. This is very cool, too. I'll return to it in a moment. > Here's where I'm puzzled. The data you posted is not a Java byte > array. > The reason is that Java bytes range in value from -128 to +127; you have a > value of F9 ( 249 ), so that can't work. Okay, I didn't actually take the time to do the exact syntax. In fact, the string (x'0e') means nothing in Java. It should be 0x0e. So, as I said, I wasn't shooting for precise syntax, just general idea. In reality, for bytes between 128 and 255, you need the following: byte b = (byte) 0xa5; That works just fine. > The second part is, and my understanding of the encoding algorithms is > limited here, so please correct: If we look at your data as non-Java > bytes at 8 bits that can go to a value of 255, then it appears to me to > amount to four double byte characters. I don't understand how you could > get to only three characters in Shift-JIS. Ahh! Welcome to the exciting world of DBCS coding. CP5026 is actually technically not a double-byte character set like Unicode. Instead, it is a "multibyte" encoding, where some characters take only one byte and others take two. In order to switch between single-byte and double-byte characters, EBCDIC uses a "shift" code. 0x0e is the "shift-in" code, which changes from single-byte to double-byte mode, while 0x0f is "shift-out". So, in my case, the three double-byte characters are x'45a0', x'479a' and x'45f9'. The x'0e' at the beginning and the x'0f' at the end are simply the shift-in/shift-out codes that are used to bracket DBCS data. Joe
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.