× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



>Hmm.. is the AS/400 "big endian"?  I don't even remember
>which way PCs are now, Big or Little Endian.

>In case you didn't know, Endian refers to the orders of the
>bytes for a word.  Does the "big" byte come first or the
>"little" byte.


OS/400 is Big Endian.  Linux on iSeries is Big Endian.  Motorolla & Sun are
Big Endian (high order or "big" byte first).

Alpha, all Intel families and (IIRC) MIPS are all little endian.  Linux on
Intel and all Microsoft OS are Little Endian.

Note carefully that Java cleverly straddles all this.  Internally, it is
implemented as whatever endian the local CPU is.  By disallowing casts of
larger over smaller or smaller over larger, and by defining JDBC and I/O
carefully, it appears to be Big Endian.

That includes the handful of interfaces for reading Unicode streams.  If
the underlying data is actually little endian Unicode (see next paragraph),
you'll have to reverse the bytes yourself.

Unicode is definitely a problem.  It was supposed to be a Big Endian
standard until Microsoft balked.  Now you can be either way, just like most
of the rest of the world.  There is an optional "throwaway" character
(0xFFFE) which reveals the intended endian of the rest of the data stream
(if you load 0xFEFF you know you got it backwards).  Almost no one I know
of actually includes it.  Since the example shows a "save as" with Unicode
and Unicode Big Endian, one presumes that the former is Little Endian.

In most cases, since the first 256 code points tend to be frequently used,
it is usually possible, by inspection, to tell what "endian" a Unicode file
uses if you don't know ahead of time.  If  it came from a PC, it is almost
certainly little endian unicode.

Generally speaking, UTF-8 (a byte encoded version of Unicode and also a
choice given before) is much easier to deal with.   UTF-8 will look almost
like ASCII with some strange three character sequences here and there.
UTF-8 will be handled much more adroitly at the end of the day, because you
don't have to worry about Java versus C on an Intel box -- both languages
can handle the translation readily, because it isn't byte-order dependent.
UTF-8 is 100 per cent interchangable with Unicode, because it is just a
different encoding of it.

Finally, since this represents a choice the user actually has, is there any
Unicode in the stream to begin with?  If the data actually is ISO 8859-1
(identical to the first 256 code points of Unicode), then see if there is
an option to store them as ordinary files.  The main practical snag to this
nowadays would be the actual Euro character if one is in the US or Western
Europe.


Larry W. Loen  -   Senior Linux, Java, and iSeries Performance Analyst




As an Amazon Associate we earn from qualifying purchases.

This thread ...


Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.