|
Leif Svaalgard wrote: >correct, but still does not excuse or explain why either >of them should be translated to/from the exclamation >mark which *is* in both character sets. There are many code point translations from ASCII to EBCDIC that "don't make sense" because of some now-unobvious history. Some particular ASCII/EBCDIC translations are based on an intuitive-style Unicode-like "pick the most sensible character" with a similar appearance and a similar function. Others, however, are based on now-departed IBM Selectric Terminal/Typewriter keyboard layouts that, in their own way, made sense at the time. You translated based on where things were on the keyboard/golfball layout, leading to some now counter-intuitive translations. Compatability being what it is, someone, somewhere probably still depends on these things. The most accessible variation of this is that the "logical not" symbol ("upper case" number 6) on US EBCDIC code page 037 is usually translated to the "caret" symbol in ASCII (also "upper case 6" on most ASCII keyboards, at least in the US). Obviously, these symbols really don't look all that much alike and are not used for the same purposes. They were selected simply because the keyboard locations lined up. Twenty years ago, at least, this was more useful than meets the eye today, because I remember dealing with it even in US only code pages. IBM had (has?) these nifty CD ROMs that you once could get that explained all this (more or less) and contained detailed translations. I believe if you look at the OS/400 iconv interface, you'll even see some of this distinction reflected in the ability to get slightly different translations from one code page to another with various options. I don't remember them all now, but you can look 'em up. One relevant option on OS/400's iconv is whether to use the "subtitute" character or some "best can do" code point. How the latter is picked probably invokes this discussion. It is probably even true that some if not most of these overall translations made the "logical not / caret" kind of decision versus the "similar/identical function" decision on a character-by-character basis. These choices were probably made based on long-ago trade-offs that may not make much sense today. But, since no doubt much code has worked around things as they are for decades, I doubt if you'll see it change. You may actually be "better off," if this sort of thing matters, to translate to Unicode first from code set 1 and then back from the Unicode result to code set 2. This should get rid of the individual translations based on keyboard location, at least, though you may not see a usefully different result in the "best can do" kinds of circumstances. How does one translate "logical not" to US ASCII, for instance? Larry W. Loen - Senior Linux, Java, and iSeries Performance Analyst Dept HP4, Rochester MN
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.