re: UNICODE on AS400 -- MIDRANGE-L

Bruce thank you for your reply.
My responses are in-line.

>Bruce wrote:
>
>Frank,
>
>I'm not sure I fully understand your question, but if you're on a
>DBCS capable system and attach DBCS display devices, then the Unicode
>to/from CHRID conversions that I mentioned earlier also take place.

We have SBCS but are looking to convert to DBCS.
UNICODE is an alternative. I think I have grasped the concept how
the AS400 uses CCSIDs (very strange stuff that I never understood
why it existed or why it was needed, but now comes the dawn) to
convert the hex bit patterns (machine readable) to human readable
characters.

>
>Unicode represents a character set that encompasses all of the various
>SBCS and DBCS characters sets available on iSeries, and as such can be
>considered to be a superset of DBCS and SBCS.  Many developers who
>previously may have written DBCS and SBCS versions of an application
>may now elect to write one version based on Unicode (or continue to use
>DBCS and SBCS as they aren't going away anytime soon).

If I had a choice I think I would use UNICODE UCS-2 over DBCS.
It seems to me that programmers would not need to code the
SHIFT IN/OUT controls if use UNICODE.

>
>Unicode/UCS2 does double disk space requirements for those data elements
>needing the expanded character set of Unicode (when compared to SBCS),
>but this "doubling" would typically not apply to numeric values (cost,
>inventory on hand, etc.), internal status fields (active, in process,
>held, etc.); and can be mitigated by not having to duplicate objects
>due to the limitations of EBCDIC character sets.  So to assume a
>doubling of disk space requirements seems a bit much.


The following is an extract from IBMs website page.
I hope copyright is not infringed I am illustrating my point that
UNICODE /UCS2 does double disk requirements. EVERY character has a
2 byte hex pattern in UCS-2.
Bruce perhaps you were thinking of UTF-8 format. I have included text
for this. The added complexity of UTF-8 (in my opinion) does not
outweigh the extra cost of disk.

<<BEGIN QUOTE>>
3-5. Unicode

As shown thus far, there are several character encoding schemes and each
scheme has multiple language versions. This greatly complicates SBCS to
DBCS porting efforts or communication between computer systems which have
 different encoding schemes and languages.
Unicode has been designed to resolve this issue and now ISO 10646 standard
specifies a couple of encoding schemes such as UCS-2 (Universal
Multiple-Octet Coded Character Set-2), UTF-8 (UCS Transformation Format-8),
and so on.

3-5-1. UCS-2
In UCS-2, each character is represented by 16 bits or 2 bytes and more
than 65,000 characters are supported. Practically, this is considered large
enough to accommodate all SBCS and DBCS characters for business needs.
To keep character coding simple and efficient, each character has its own
unique value with a 2 byte fixed length. For example, uppercase A is
represented by X'0041' while this is X'41' in ASCII.
There are not any complex modes or escape sequences.

Though its concept is simple, the SBCS data size will doubled if it is
ported from ASCII or EBCDIC. Also, this basic concept of two byte fixed
length per one character may not well fit with systems that assume one byte
per one character.
For example, the upper byte of the SBCS character is always X'00' (null)
which means the end of string in the C language environment. As AS/400
natively supports a mechanism to store UCS-2, this consideration is not 
applicable.

IBM defines CCSID 13488 for UCS-2, and AS/400 supports it with DB2 UDB
for AS/400 and some objects since Version 3 Release 7.

3-5-2. UTF-8

UTF-8 stands for UCS Transformation Format, 8 bit format.
This encoding is also a part of ISO 10646 standard and is supposed to
resolve some concerns in UCS-2.
One advantage is the file system safety because null (X'00') bytes
are not included in any characters.
Also, the hexadecimal value of SBCS UTF-8 is the same as SBCS ASCII.
UTF-8 is a variable length format.,
<<END QUOTE>>


>
>Unicode implementation would not be source code compatible with DBCS
>(or SBCS) solutions; but from an application user point of view you
>should be able to do the same (or better) job in terms of implementation
>and capability.

I rarely have the luxury of implementing from scratch. I have
to work with packaged software. I thank you for your information.
I will have to use DBCS as the software is coded for DBCS.

Frank Kolmann



>
>Bruce
>
>>
>>Thanks for the info Bruce,
>>As per the manual I can set up SBS for the devices and
>>set the CHRID on the CRTDEV.
>>Perhaps you can help me with some DBCS considerations.
>>It seems to me that UNICODE supercedes DBCS.
>>UNICODE also doubles you disk space requirements,
>>but disk is relatively cheap, programmers cost.
>>We can buy a DBCS enabled version of an application.
>>I see that a UNICODE implementation is not
>>compatible with DBCS, am I right.
>>
>>Frank Kolmann
>