Leif Svaalgard wrote:

>correct, but still does not excuse or explain why either
>of them should be translated to/from the exclamation
>mark which *is* in both character sets.

There are many code point translations from ASCII to EBCDIC that "don't
make sense" because of some now-unobvious history.

Some particular ASCII/EBCDIC translations are based on an intuitive-style
Unicode-like "pick the most sensible character" with a similar appearance
and a similar function.

Others, however, are based on now-departed IBM Selectric
Terminal/Typewriter keyboard layouts that, in their own way, made sense at
the time.  You translated based on where things were on the
keyboard/golfball layout, leading to some now counter-intuitive
translations.  Compatability being what it is, someone, somewhere probably
still depends on these things.  The most accessible variation of this is
that the "logical not" symbol ("upper case" number 6) on US EBCDIC code
page 037 is usually translated to the "caret" symbol in ASCII (also "upper
case 6" on most ASCII keyboards, at least in the US).  Obviously, these
symbols really don't look all that much alike and are not used for the same
purposes.  They were selected simply because the keyboard locations lined
up.  Twenty years ago, at least, this was more useful than meets the eye
today, because I remember dealing with it even in US only code pages.

IBM had (has?) these nifty CD ROMs that you once could get that explained
all this (more or less) and contained detailed translations.  I believe if
you look at the OS/400 iconv interface, you'll even see some of this
distinction reflected in the ability to get slightly different translations
from one code page to another with various options.  I don't remember them
all now, but you can look 'em up.  One relevant option on OS/400's iconv is
whether to use the "subtitute" character or some "best can do" code point.
How the latter is picked probably invokes this discussion.

It is probably even true that some if not most of these overall
translations made the "logical not / caret" kind of decision versus the
"similar/identical function" decision on a character-by-character basis.
These choices were probably made based on long-ago trade-offs that may not
make much sense today.  But, since no doubt much code has worked around
things as they are for decades, I doubt if you'll see it change.

You may actually be "better off," if this sort of thing matters, to
translate to Unicode first from code set 1 and then back from the Unicode
result to code set 2.  This should get rid of the individual translations
based on keyboard location, at least, though you may not see a usefully
different result in the "best can do" kinds of circumstances.  How does one
translate "logical not" to US ASCII, for instance?

Larry W. Loen  -   Senior Linux, Java, and iSeries Performance Analyst
                          Dept HP4, Rochester MN

This thread ...

Return to Archive home page | Return to MIDRANGE.COM home page