|
>Hmm.. is the AS/400 "big endian"? I don't even remember >which way PCs are now, Big or Little Endian. >In case you didn't know, Endian refers to the orders of the >bytes for a word. Does the "big" byte come first or the >"little" byte. OS/400 is Big Endian. Linux on iSeries is Big Endian. Motorolla & Sun are Big Endian (high order or "big" byte first). Alpha, all Intel families and (IIRC) MIPS are all little endian. Linux on Intel and all Microsoft OS are Little Endian. Note carefully that Java cleverly straddles all this. Internally, it is implemented as whatever endian the local CPU is. By disallowing casts of larger over smaller or smaller over larger, and by defining JDBC and I/O carefully, it appears to be Big Endian. That includes the handful of interfaces for reading Unicode streams. If the underlying data is actually little endian Unicode (see next paragraph), you'll have to reverse the bytes yourself. Unicode is definitely a problem. It was supposed to be a Big Endian standard until Microsoft balked. Now you can be either way, just like most of the rest of the world. There is an optional "throwaway" character (0xFFFE) which reveals the intended endian of the rest of the data stream (if you load 0xFEFF you know you got it backwards). Almost no one I know of actually includes it. Since the example shows a "save as" with Unicode and Unicode Big Endian, one presumes that the former is Little Endian. In most cases, since the first 256 code points tend to be frequently used, it is usually possible, by inspection, to tell what "endian" a Unicode file uses if you don't know ahead of time. If it came from a PC, it is almost certainly little endian unicode. Generally speaking, UTF-8 (a byte encoded version of Unicode and also a choice given before) is much easier to deal with. UTF-8 will look almost like ASCII with some strange three character sequences here and there. UTF-8 will be handled much more adroitly at the end of the day, because you don't have to worry about Java versus C on an Intel box -- both languages can handle the translation readily, because it isn't byte-order dependent. UTF-8 is 100 per cent interchangable with Unicode, because it is just a different encoding of it. Finally, since this represents a choice the user actually has, is there any Unicode in the stream to begin with? If the data actually is ISO 8859-1 (identical to the first 256 code points of Unicode), then see if there is an option to store them as ordinary files. The main practical snag to this nowadays would be the actual Euro character if one is in the US or Western Europe. Larry W. Loen - Senior Linux, Java, and iSeries Performance Analyst
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.